fbpx
Wikipedia

Speech coding

Speech coding is an application of data compression to digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic data compression algorithms to represent the resulting modeled parameters in a compact bitstream.[1]

Common applications of speech coding are mobile telephony and voice over IP (VoIP).[2] The most widely used speech coding technique in mobile telephony is linear predictive coding (LPC), while the most widely used in VoIP applications are the LPC and modified discrete cosine transform (MDCT) techniques.[citation needed]

The techniques employed in speech coding are similar to those used in audio data compression and audio coding where appreciation of psychoacoustics is used to transmit only data that is relevant to the human auditory system. For example, in voiceband speech coding, only information in the frequency band 400 to 3500 Hz is transmitted but the reconstructed signal retains adequate intelligibility.

Speech coding differs from other forms of audio coding in that speech is a simpler signal than other audio signals, and statistical information is available about the properties of speech. As a result, some auditory information that is relevant in general audio coding can be unnecessary in the speech coding context. Speech coding stresses the preservation of intelligibility and pleasantness of speech while using a constrained amount of transmitted data.[3] In addition, most speech applications require low coding delay, as latency interferes with speech interaction.[4]

Categories edit

Speech coders are of two classes:[5]

  1. Waveform coders
  2. Vocoders

Sample companding viewed as a form of speech coding edit

The A-law and μ-law algorithms used in G.711 PCM digital telephony can be seen as an earlier precursor of speech encoding, requiring only 8 bits per sample but giving effectively 12 bits of resolution.[7] Logarithmic companding are consistent with human hearing perception in that a low-amplitude noise is heard along a low-amplitude speech signal but is masked by a high-amplitude one. Although this would generate unacceptable distortion in a music signal, the peaky nature of speech waveforms, combined with the simple frequency structure of speech as a periodic waveform having a single fundamental frequency with occasional added noise bursts, make these very simple instantaneous compression algorithms acceptable for speech.[citation needed][dubious ]

A wide variety of other algorithms were tried at the time, mostly delta modulation variants, but after careful consideration, the A-law/μ-law algorithms were chosen by the designers of the early digital telephony systems. At the time of their design, their 33% bandwidth reduction for a very low complexity made an excellent engineering compromise. Their audio performance remains acceptable, and there was no need to replace them in the stationary phone network.[citation needed]

In 2008, G.711.1 codec, which has a scalable structure, was standardized by ITU-T. The input sampling rate is 16 kHz.[8]

Modern speech compression edit

Much of the later work in speech compression was motivated by military research into digital communications for secure military radios, where very low data rates were used to achieve effective operation in a hostile radio environment. At the same time, far more processing power was available, in the form of VLSI circuits, than was available for earlier compression techniques. As a result, modern speech compression algorithms could use far more complex techniques than were available in the 1960s to achieve far higher compression ratios.

The most widely used speech coding algorithms are based on linear predictive coding (LPC).[9] In particular, the most common speech coding scheme is the LPC-based code-excited linear prediction (CELP) coding, which is used for example in the GSM standard. In CELP, the modeling is divided in two stages, a linear predictive stage that models the spectral envelope and a code-book-based model of the residual of the linear predictive model. In CELP, linear prediction coefficients (LPC) are computed and quantized, usually as line spectral pairs (LSPs). In addition to the actual speech coding of the signal, it is often necessary to use channel coding for transmission, to avoid losses due to transmission errors. In order to get the best overall coding results, speech coding and channel coding methods are chosen in pairs, with the more important bits in the speech data stream protected by more robust channel coding.

The modified discrete cosine transform (MDCT) is used in the LD-MDCT technique used by the AAC-LD format introduced in 1999.[10] MDCT has since been widely adopted in voice-over-IP (VoIP) applications, such as the G.729.1 wideband audio codec introduced in 2006,[11] Apple's FaceTime (using AAC-LD) introduced in 2010,[12] and the CELT codec introduced in 2011.[13]

Opus is a free software audio coder. It combines the speech-oriented LPC-based SILK algorithm and the lower-latency MDCT-based CELT algorithm, switching between or combining them as needed for maximal efficiency.[14][15] It is widely used for VoIP calls in WhatsApp.[16][17][18] The PlayStation 4 video game console also uses Opus for its PlayStation Network system party chat.[19]

A number of codecs with even lower bit rates have been demonstrated. Codec2, which operates at bit rates as low as 450 bit/s, sees use in amateur radio.[20] NATO currently uses MELPe, offering intelligible speech at 600 bit/s and below.[21] Neural vocoder approaches have also emerged: Lyra by Google gives an "almost eerie" quality at 3 kbit/s.[22] Microsoft's Satin also uses machine learning, but uses a higher tunable bitrate and is wideband.[23]

Sub-fields edit

Wideband audio coding
Narrowband audio coding

See also edit

References edit

  1. ^ M. Arjona Ramírez and M. Minami, "Low bit rate speech coding," in Wiley Encyclopedia of Telecommunications, J. G. Proakis, Ed., New York: Wiley, 2003, vol. 3, pp. 1299-1308.
  2. ^ M. Arjona Ramírez and M. Minami, "Technology and standards for low-bit-rate vocoding methods," in The Handbook of Computer Networks, H. Bidgoli, Ed., New York: Wiley, 2011, vol. 2, pp. 447–467.
  3. ^ P. Kroon, "Evaluation of speech coders," in Speech Coding and Synthesis, W. Bastiaan Kleijn and K. K. Paliwal, Ed., Amsterdam: Elsevier Science, 1995, pp. 467-494.
  4. ^ J. H. Chen, R. V. Cox, Y.-C. Lin, N. S. Jayant, and M. J. Melchner, A low-delay CELP coder for the CCITT 16 kb/s speech coding standard. IEEE J. Select. Areas Commun. 10(5): 830-849, June 1992.
  5. ^ . Archived from the original on 7 September 2006.
  6. ^ Zeghidour, Neil; Luebs, Alejandro; Omran, Ahmed; Skoglund, Jan; Tagliasacchi, Marco (2022). "SoundStream: An End-to-End Neural Audio Codec". IEEE/ACM Transactions on Audio, Speech, and Language Processing. 30: 495–507. arXiv:2107.03312. doi:10.1109/TASLP.2021.3129994. S2CID 236149944.
  7. ^ N. S. Jayant and P. Noll, Digital coding of waveforms. Englewood Cliffs: Prentice-Hall, 1984.
  8. ^ G.711.1 : Wideband embedded extension for G.711 pulse code modulation, ITU-T, 2012, retrieved 2022-12-24
  9. ^ Gupta, Shipra (May 2016). (PDF). International Journal of Advanced Research in Computer Science and Software Engineering. 6 (5): 805–810 (806). ISSN 2277-128X. S2CID 212485331. Archived from the original (PDF) on 2019-10-18. Retrieved 18 October 2019.
  10. ^ Schnell, Markus; Schmidt, Markus; Jander, Manuel; Albert, Tobias; Geiger, Ralf; Ruoppila, Vesa; Ekstrand, Per; Bernhard, Grill (October 2008). MPEG-4 Enhanced Low Delay AAC - A New Standard for High Quality Communication (PDF). 125th AES Convention. Fraunhofer IIS. Audio Engineering Society. Retrieved 20 October 2019.
  11. ^ Nagireddi, Sivannarayana (2008). VoIP Voice and Fax Signal Processing. John Wiley & Sons. p. 69. ISBN 9780470377864.
  12. ^ Daniel Eran Dilger (June 8, 2010). "Inside iPhone 4: FaceTime video calling". AppleInsider. Retrieved June 9, 2010.
  13. ^ Presentation of the CELT codec 2011-08-07 at the Wayback Machine by Timothy B. Terriberry (65 minutes of video, see also presentation slides in PDF)
  14. ^ "Opus Codec". Opus (Home page). Xiph.org Foundation. Retrieved July 31, 2012.
  15. ^ Valin, Jean-Marc; Maxwell, Gregory; Terriberry, Timothy B.; Vos, Koen (October 2013). High-Quality, Low-Delay Music Coding in the Opus Codec. 135th AES Convention. Audio Engineering Society. arXiv:1602.04845.
  16. ^ Leyden, John (27 October 2015). "WhatsApp laid bare: Info-sucking app's innards probed". The Register. Retrieved 19 October 2019.
  17. ^ Hazra, Sudip; Mateti, Prabhaker (September 13–16, 2017). "Challenges in Android Forensics". In Thampi, Sabu M.; Pérez, Gregorio Martínez; Westphall, Carlos Becker; Hu, Jiankun; Fan, Chun I.; Mármol, Félix Gómez (eds.). Security in Computing and Communications: 5th International Symposium, SSCC 2017. Springer. pp. 286–299 (290). doi:10.1007/978-981-10-6898-0_24. ISBN 9789811068980.
  18. ^ Srivastava, Saurabh Ranjan; Dube, Sachin; Shrivastaya, Gulshan; Sharma, Kavita (2019). "Smartphone Triggered Security Challenges: Issues, Case Studies and Prevention". In Le, Dac-Nhuong; Kumar, Raghvendra; Mishra, Brojo Kishore; Chatterjee, Jyotir Moy; Khari, Manju (eds.). Cyber Security in Parallel and Distributed Computing: Concepts, Techniques, Applications and Case Studies. John Wiley & Sons. pp. 187–206 (200). doi:10.1002/9781119488330.ch12. ISBN 9781119488057. S2CID 214034702. {{cite book}}: |journal= ignored (help)
  19. ^ "Open Source Software used in PlayStation4". Sony Interactive Entertainment Inc. Retrieved 2017-12-11.[failed verification]
  20. ^ "GitHub - Codec2". GitHub. November 2019.
  21. ^ Alan McCree, “A scalable phonetic vocoder framework using joint predictive vector quantization of MELP parameters,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, 2006, pp. I 705–708, Toulouse, France
  22. ^ Buckley, Ian (2021-04-08). "Google Makes Its Lyra Low Bitrate Speech Codec Public". MakeUseOf. Retrieved 2022-07-21.
  23. ^ Levent-Levi, Tsahi (2021-04-19). "Lyra, Satin and the future of voice codecs in WebRTC". BlogGeek.me. Retrieved 2022-07-21.
  24. ^ "LPCNet: Efficient neural speech synthesis". Xiph.Org Foundation. 8 August 2023.

External links edit

  • ITU-T Test Signals for Telecommunication Systems Test Samples
  • ITU-T Perceptual evaluation of speech quality (PESQ) tool Sources

speech, coding, this, article, needs, additional, citations, verification, please, help, improve, this, article, adding, citations, reliable, sources, unsourced, material, challenged, removed, find, sources, news, newspapers, books, scholar, jstor, january, 20. This article needs additional citations for verification Please help improve this article by adding citations to reliable sources Unsourced material may be challenged and removed Find sources Speech coding news newspapers books scholar JSTOR January 2013 Learn how and when to remove this template message Speech coding is an application of data compression to digital audio signals containing speech Speech coding uses speech specific parameter estimation using audio signal processing techniques to model the speech signal combined with generic data compression algorithms to represent the resulting modeled parameters in a compact bitstream 1 Common applications of speech coding are mobile telephony and voice over IP VoIP 2 The most widely used speech coding technique in mobile telephony is linear predictive coding LPC while the most widely used in VoIP applications are the LPC and modified discrete cosine transform MDCT techniques citation needed The techniques employed in speech coding are similar to those used in audio data compression and audio coding where appreciation of psychoacoustics is used to transmit only data that is relevant to the human auditory system For example in voiceband speech coding only information in the frequency band 400 to 3500 Hz is transmitted but the reconstructed signal retains adequate intelligibility Speech coding differs from other forms of audio coding in that speech is a simpler signal than other audio signals and statistical information is available about the properties of speech As a result some auditory information that is relevant in general audio coding can be unnecessary in the speech coding context Speech coding stresses the preservation of intelligibility and pleasantness of speech while using a constrained amount of transmitted data 3 In addition most speech applications require low coding delay as latency interferes with speech interaction 4 Contents 1 Categories 2 Sample companding viewed as a form of speech coding 3 Modern speech compression 3 1 Sub fields 4 See also 5 References 6 External linksCategories editSpeech coders are of two classes 5 Waveform coders Time domain PCM ADPCM Frequency domain sub band coding ATRAC Vocoders Linear predictive coding LPC Formant coding Machine learning i e neural vocoder 6 Sample companding viewed as a form of speech coding editThe A law and m law algorithms used in G 711 PCM digital telephony can be seen as an earlier precursor of speech encoding requiring only 8 bits per sample but giving effectively 12 bits of resolution 7 Logarithmic companding are consistent with human hearing perception in that a low amplitude noise is heard along a low amplitude speech signal but is masked by a high amplitude one Although this would generate unacceptable distortion in a music signal the peaky nature of speech waveforms combined with the simple frequency structure of speech as a periodic waveform having a single fundamental frequency with occasional added noise bursts make these very simple instantaneous compression algorithms acceptable for speech citation needed dubious discuss A wide variety of other algorithms were tried at the time mostly delta modulation variants but after careful consideration the A law m law algorithms were chosen by the designers of the early digital telephony systems At the time of their design their 33 bandwidth reduction for a very low complexity made an excellent engineering compromise Their audio performance remains acceptable and there was no need to replace them in the stationary phone network citation needed In 2008 G 711 1 codec which has a scalable structure was standardized by ITU T The input sampling rate is 16 kHz 8 Modern speech compression editMuch of the later work in speech compression was motivated by military research into digital communications for secure military radios where very low data rates were used to achieve effective operation in a hostile radio environment At the same time far more processing power was available in the form of VLSI circuits than was available for earlier compression techniques As a result modern speech compression algorithms could use far more complex techniques than were available in the 1960s to achieve far higher compression ratios The most widely used speech coding algorithms are based on linear predictive coding LPC 9 In particular the most common speech coding scheme is the LPC based code excited linear prediction CELP coding which is used for example in the GSM standard In CELP the modeling is divided in two stages a linear predictive stage that models the spectral envelope and a code book based model of the residual of the linear predictive model In CELP linear prediction coefficients LPC are computed and quantized usually as line spectral pairs LSPs In addition to the actual speech coding of the signal it is often necessary to use channel coding for transmission to avoid losses due to transmission errors In order to get the best overall coding results speech coding and channel coding methods are chosen in pairs with the more important bits in the speech data stream protected by more robust channel coding The modified discrete cosine transform MDCT is used in the LD MDCT technique used by the AAC LD format introduced in 1999 10 MDCT has since been widely adopted in voice over IP VoIP applications such as the G 729 1 wideband audio codec introduced in 2006 11 Apple s FaceTime using AAC LD introduced in 2010 12 and the CELT codec introduced in 2011 13 Opus is a free software audio coder It combines the speech oriented LPC based SILK algorithm and the lower latency MDCT based CELT algorithm switching between or combining them as needed for maximal efficiency 14 15 It is widely used for VoIP calls in WhatsApp 16 17 18 The PlayStation 4 video game console also uses Opus for its PlayStation Network system party chat 19 A number of codecs with even lower bit rates have been demonstrated Codec2 which operates at bit rates as low as 450 bit s sees use in amateur radio 20 NATO currently uses MELPe offering intelligible speech at 600 bit s and below 21 Neural vocoder approaches have also emerged Lyra by Google gives an almost eerie quality at 3 kbit s 22 Microsoft s Satin also uses machine learning but uses a higher tunable bitrate and is wideband 23 Sub fields edit Wideband audio codingLinear predictive coding LPC AMR WB for WCDMA networks VMR WB for CDMA2000 networks Speex IP MR SILK part of Opus and USAC xHE AAC for VoIP and videoconferencing Modified discrete cosine transform MDCT AAC LD G 722 1 G 729 1 CELT and Opus for VoIP and videoconferencing Adaptive differential pulse code modulation ADPCM G 722 for VoIP Neural speech coding Lyra Google V1 uses neural network reconstruction of log mel spectrogram V2 is an end to end autoencoder Satin Microsoft LPCNet Mozilla Xiph neural network reconstruction of LPC features 24 Narrowband audio codingLPC FNBDT for military applications SMV for CDMA networks Full Rate Half Rate EFR and AMR for GSM networks G 723 1 G 728 G 729 G 729 1 and iLBC for VoIP or videoconferencing ADPCM G 726 for VoIP Multi Band Excitation MBE AMBE for digital mobile radio and satellite phone Codec 2See also editDigital signal processing Speech interface guideline Speech processing Speech synthesis Vector quantizationReferences edit M Arjona Ramirez and M Minami Low bit rate speech coding in Wiley Encyclopedia of Telecommunications J G Proakis Ed New York Wiley 2003 vol 3 pp 1299 1308 M Arjona Ramirez and M Minami Technology and standards for low bit rate vocoding methods in The Handbook of Computer Networks H Bidgoli Ed New York Wiley 2011 vol 2 pp 447 467 P Kroon Evaluation of speech coders in Speech Coding and Synthesis W Bastiaan Kleijn and K K Paliwal Ed Amsterdam Elsevier Science 1995 pp 467 494 J H Chen R V Cox Y C Lin N S Jayant and M J Melchner A low delay CELP coder for the CCITT 16 kb s speech coding standard IEEE J Select Areas Commun 10 5 830 849 June 1992 Soo Hyun Bae ECE 8873 Data Compression amp Modeling Georgia Institute of Technology 2004 Archived from the original on 7 September 2006 Zeghidour Neil Luebs Alejandro Omran Ahmed Skoglund Jan Tagliasacchi Marco 2022 SoundStream An End to End Neural Audio Codec IEEE ACM Transactions on Audio Speech and Language Processing 30 495 507 arXiv 2107 03312 doi 10 1109 TASLP 2021 3129994 S2CID 236149944 N S Jayant and P Noll Digital coding of waveforms Englewood Cliffs Prentice Hall 1984 G 711 1 Wideband embedded extension for G 711 pulse code modulation ITU T 2012 retrieved 2022 12 24 Gupta Shipra May 2016 Application of MFCC in Text Independent Speaker Recognition PDF International Journal of Advanced Research in Computer Science and Software Engineering 6 5 805 810 806 ISSN 2277 128X S2CID 212485331 Archived from the original PDF on 2019 10 18 Retrieved 18 October 2019 Schnell Markus Schmidt Markus Jander Manuel Albert Tobias Geiger Ralf Ruoppila Vesa Ekstrand Per Bernhard Grill October 2008 MPEG 4 Enhanced Low Delay AAC A New Standard for High Quality Communication PDF 125th AES Convention Fraunhofer IIS Audio Engineering Society Retrieved 20 October 2019 Nagireddi Sivannarayana 2008 VoIP Voice and Fax Signal Processing John Wiley amp Sons p 69 ISBN 9780470377864 Daniel Eran Dilger June 8 2010 Inside iPhone 4 FaceTime video calling AppleInsider Retrieved June 9 2010 Presentation of the CELT codec Archived 2011 08 07 at the Wayback Machine by Timothy B Terriberry 65 minutes of video see also presentation slides in PDF Opus Codec Opus Home page Xiph org Foundation Retrieved July 31 2012 Valin Jean Marc Maxwell Gregory Terriberry Timothy B Vos Koen October 2013 High Quality Low Delay Music Coding in the Opus Codec 135th AES Convention Audio Engineering Society arXiv 1602 04845 Leyden John 27 October 2015 WhatsApp laid bare Info sucking app s innards probed The Register Retrieved 19 October 2019 Hazra Sudip Mateti Prabhaker September 13 16 2017 Challenges in Android Forensics In Thampi Sabu M Perez Gregorio Martinez Westphall Carlos Becker Hu Jiankun Fan Chun I Marmol Felix Gomez eds Security in Computing and Communications 5th International Symposium SSCC 2017 Springer pp 286 299 290 doi 10 1007 978 981 10 6898 0 24 ISBN 9789811068980 Srivastava Saurabh Ranjan Dube Sachin Shrivastaya Gulshan Sharma Kavita 2019 Smartphone Triggered Security Challenges Issues Case Studies and Prevention In Le Dac Nhuong Kumar Raghvendra Mishra Brojo Kishore Chatterjee Jyotir Moy Khari Manju eds Cyber Security in Parallel and Distributed Computing Concepts Techniques Applications and Case Studies John Wiley amp Sons pp 187 206 200 doi 10 1002 9781119488330 ch12 ISBN 9781119488057 S2CID 214034702 a href Template Cite book html title Template Cite book cite book a journal ignored help Open Source Software used in PlayStation4 Sony Interactive Entertainment Inc Retrieved 2017 12 11 failed verification GitHub Codec2 GitHub November 2019 Alan McCree A scalable phonetic vocoder framework using joint predictive vector quantization of MELP parameters in Proc IEEE Int Conf Acoust Speech Signal Processing 2006 pp I 705 708 Toulouse France Buckley Ian 2021 04 08 Google Makes Its Lyra Low Bitrate Speech Codec Public MakeUseOf Retrieved 2022 07 21 Levent Levi Tsahi 2021 04 19 Lyra Satin and the future of voice codecs in WebRTC BlogGeek me Retrieved 2022 07 21 LPCNet Efficient neural speech synthesis Xiph Org Foundation 8 August 2023 External links editITU T Test Signals for Telecommunication Systems Test Samples ITU T Perceptual evaluation of speech quality PESQ tool Sources Retrieved from https en wikipedia org w index php title Speech coding amp oldid 1216123890, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.