fbpx
Wikipedia

Speech recognition software for Linux

As of the early 2000s, several speech recognition (SR) software packages exist for Linux. Some of them are free and open-source software and others are proprietary software. Speech recognition usually refers to software that attempts to distinguish thousands of words in a human language. Voice control may refer to software used for communicating operational commands to a computer.

Linux native speech recognition edit

History edit

In the late 1990s, a Linux version of ViaVoice, created by IBM, was made available to users for no charge. In 2002, the free software development kit (SDK) was removed by the developer.

Development status edit

In the early 2000s, there was a push to get a high-quality Linux native speech recognition engine developed. As a result, several projects dedicated to creating Linux speech recognition programs were begun, such as Mycroft, which is similar to Microsoft Cortana, but open-source.

Speech sample crowdsourcing edit

It is essential to compile a speech corpus to produce acoustic models for speech recognition projects. VoxForge is a free speech corpus and acoustic model repository that was built to collect transcribed speech to be used in speech recognition projects. VoxForge accepts crowdsourced speech samples and corrections of recognized speech sequences. It is licensed under a GNU General Public License (GPL).

Speech recognition concept edit

The first step is to begin recording an audio stream on a computer. The user has two main processing options:

  • Discrete speech recognition (DSR) – processes information on a local machine entirely. This refers to self-contained systems in which all aspects of SR are performed entirely within the user's computer. This is becoming critical for protecting intellectual property (IP) and avoiding unwanted surveillance (2018).
  • Remote or server-based SR – transmits an audio speech file to a remote server to convert the file into a text string file. Due to recent cloud storage schemes and data mining, this method more easily allows surveillance, theft of information, and inserting malware.

Remote recognition was formerly used by smartphones because they lacked sufficient performance, working memory, or storage to process speech recognition within the phone. These limits have largely been overcome although server-based SR on mobile devices remains universal.

Speech recognition in browser edit

Discrete speech recognition can be performed within a web browser and works well with supported browsers. Remote SR does not require installing software on a desktop computer or mobile device as it is mainly a server-based system with the inherent security issues noted above.

  • Remote: The dictation service records an audio track of the user via a web browser.
  • DSR: Some solutions work on a client only, without sending data to servers.

Free speech recognition engines edit

The following is a list of projects dedicated to implementing speech recognition in Linux, and major native solutions. These are not end-user applications. These are programming libraries that may be used to develop end-user applications.

  • CMU Sphinx is a general term to describe a group of speech recognition systems developed at Carnegie Mellon University.
  • HTK is the most famous and widely used speech recognition software before Kaldi.
  • Julius is a high-performance, two-pass large vocabulary continuous speech recognition (LVCSR) decoder software for speech-related researchers and developers.
  • Kaldi is a toolkit for speech recognition provided under the Apache licence.
  • Mozilla DeepSpeech is developing an open-source Speech-To-Text engine based on Baidu's deep speech research paper.[1]
  • VoxForge is a free speech corpus and acoustic model repository for open-source speech recognition engines.

Proprietary speech recognition engines edit

Voice control and keyboard shortcuts edit

Speech recognition usually refers to software that attempts to distinguish thousands of words in a human language. Voice control may refer to software used for sending operational commands to a computer or appliance. Voice control typically requires a much smaller vocabulary and thus is much easier to implement.

Simple software combined with keyboard shortcuts, have the earliest potential for practically accurate voice control in Linux.

Running Windows speech recognition software with Linux edit

Via compatibility layer edit

It is possible to use programs such as Dragon NaturallySpeaking in Linux, by using Wine, though some problems may arise, depending on which version is used.[3]

Via virtualized Windows edit

It is also possible to use Windows speech recognition software under Linux. Using no-cost virtualization software, it is possible to run Windows and NaturallySpeaking under Linux. VMware Server or VirtualBox support copy and paste to/from a virtual machine, making dictated text easily transferable to/from the virtual machine.

See also edit

References edit

  1. ^ "A TensorFlow implementation of Baidu's DeepSpeech architecture". Mozilla. 2017-12-05. Retrieved 2017-12-05.
  2. ^ (IAR), Roedder, Margit (26 January 2018). "KIT – Janus Recognition Toolkit". isl.ira.uka.de.{{cite web}}: CS1 maint: multiple names: authors list (link)
  3. ^ "WineHQ – Dragon Naturally Speaking". appdb.winehq.org.

External links edit

  • Accessibility, SpeechRecognition – Ubuntu Help

speech, recognition, software, linux, this, article, multiple, issues, please, help, improve, discuss, these, issues, talk, page, learn, when, remove, these, template, messages, this, article, technical, most, readers, understand, please, help, improve, make, . This article has multiple issues Please help improve it or discuss these issues on the talk page Learn how and when to remove these template messages This article may be too technical for most readers to understand Please help improve it to make it understandable to non experts without removing the technical details February 2012 Learn how and when to remove this template message This article needs additional citations for verification Please help improve this article by adding citations to reliable sources Unsourced material may be challenged and removed Find sources Speech recognition software for Linux news newspapers books scholar JSTOR February 2012 Learn how and when to remove this template message This article needs to be updated Please help update this article to reflect recent events or newly available information March 2023 Learn how and when to remove this template message As of the early 2000s several speech recognition SR software packages exist for Linux Some of them are free and open source software and others are proprietary software Speech recognition usually refers to software that attempts to distinguish thousands of words in a human language Voice control may refer to software used for communicating operational commands to a computer Contents 1 Linux native speech recognition 1 1 History 1 2 Development status 1 2 1 Speech sample crowdsourcing 1 3 Speech recognition concept 1 3 1 Speech recognition in browser 1 3 2 Free speech recognition engines 1 3 3 Proprietary speech recognition engines 2 Voice control and keyboard shortcuts 3 Running Windows speech recognition software with Linux 3 1 Via compatibility layer 3 2 Via virtualized Windows 4 See also 5 References 6 External linksLinux native speech recognition editHistory edit In the late 1990s a Linux version of ViaVoice created by IBM was made available to users for no charge In 2002 the free software development kit SDK was removed by the developer Development status edit In the early 2000s there was a push to get a high quality Linux native speech recognition engine developed As a result several projects dedicated to creating Linux speech recognition programs were begun such as Mycroft which is similar to Microsoft Cortana but open source Speech sample crowdsourcing edit It is essential to compile a speech corpus to produce acoustic models for speech recognition projects VoxForge is a free speech corpus and acoustic model repository that was built to collect transcribed speech to be used in speech recognition projects VoxForge accepts crowdsourced speech samples and corrections of recognized speech sequences It is licensed under a GNU General Public License GPL Speech recognition concept edit The first step is to begin recording an audio stream on a computer The user has two main processing options Discrete speech recognition DSR processes information on a local machine entirely This refers to self contained systems in which all aspects of SR are performed entirely within the user s computer This is becoming critical for protecting intellectual property IP and avoiding unwanted surveillance 2018 Remote or server based SR transmits an audio speech file to a remote server to convert the file into a text string file Due to recent cloud storage schemes and data mining this method more easily allows surveillance theft of information and inserting malware Remote recognition was formerly used by smartphones because they lacked sufficient performance working memory or storage to process speech recognition within the phone These limits have largely been overcome although server based SR on mobile devices remains universal Speech recognition in browser edit Discrete speech recognition can be performed within a web browser and works well with supported browsers Remote SR does not require installing software on a desktop computer or mobile device as it is mainly a server based system with the inherent security issues noted above Remote The dictation service records an audio track of the user via a web browser DSR Some solutions work on a client only without sending data to servers Free speech recognition engines edit The following is a list of projects dedicated to implementing speech recognition in Linux and major native solutions These are not end user applications These are programming libraries that may be used to develop end user applications CMU Sphinx is a general term to describe a group of speech recognition systems developed at Carnegie Mellon University HTK is the most famous and widely used speech recognition software before Kaldi Julius is a high performance two pass large vocabulary continuous speech recognition LVCSR decoder software for speech related researchers and developers Kaldi is a toolkit for speech recognition provided under the Apache licence Mozilla DeepSpeech is developing an open source Speech To Text engine based on Baidu s deep speech research paper 1 VoxForge is a free speech corpus and acoustic model repository for open source speech recognition engines Proprietary speech recognition engines edit Janus Recognition Toolkit JRTk is a closed source speech recognition toolkit mainly targeted at Linux developed by the Interactive Systems Laboratories developed at Carnegie Mellon University and Karlsruhe Institute of Technology for which commercial and research licenses are available 2 Voice control and keyboard shortcuts editSpeech recognition usually refers to software that attempts to distinguish thousands of words in a human language Voice control may refer to software used for sending operational commands to a computer or appliance Voice control typically requires a much smaller vocabulary and thus is much easier to implement Simple software combined with keyboard shortcuts have the earliest potential for practically accurate voice control in Linux Running Windows speech recognition software with Linux editVia compatibility layer edit It is possible to use programs such as Dragon NaturallySpeaking in Linux by using Wine though some problems may arise depending on which version is used 3 Via virtualized Windows edit It is also possible to use Windows speech recognition software under Linux Using no cost virtualization software it is possible to run Windows and NaturallySpeaking under Linux VMware Server or VirtualBox support copy and paste to from a virtual machine making dictated text easily transferable to from the virtual machine See also editList of speech recognition software Speech interface guideline Guideline for designing interfaces operated by human voiceReferences edit A TensorFlow implementation of Baidu s DeepSpeech architecture Mozilla 2017 12 05 Retrieved 2017 12 05 IAR Roedder Margit 26 January 2018 KIT Janus Recognition Toolkit isl ira uka de a href Template Cite web html title Template Cite web cite web a CS1 maint multiple names authors list link WineHQ Dragon Naturally Speaking appdb winehq org External links editAccessibility SpeechRecognition Ubuntu Help Retrieved from https en wikipedia org w index php title Speech recognition software for Linux amp oldid 1144588586, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.