fbpx
Wikipedia

Linear predictive coding

Linear predictive coding (LPC) is a method used mostly in audio signal processing and speech processing for representing the spectral envelope of a digital signal of speech in compressed form, using the information of a linear predictive model.[1][2]

LPC is the most widely used method in speech coding and speech synthesis. It is a powerful speech analysis technique, and a useful method for encoding good quality speech at a low bit rate.

Overview

LPC starts with the assumption that a speech signal is produced by a buzzer at the end of a tube (for voiced sounds), with occasional added hissing and popping sounds (for voiceless sounds such as sibilants and plosives). Although apparently crude, this Source–filter model is actually a close approximation of the reality of speech production. The glottis (the space between the vocal folds) produces the buzz, which is characterized by its intensity (loudness) and frequency (pitch). The vocal tract (the throat and mouth) forms the tube, which is characterized by its resonances; these resonances give rise to formants, or enhanced frequency bands in the sound produced. Hisses and pops are generated by the action of the tongue, lips and throat during sibilants and plosives.

LPC analyzes the speech signal by estimating the formants, removing their effects from the speech signal, and estimating the intensity and frequency of the remaining buzz. The process of removing the formants is called inverse filtering, and the remaining signal after the subtraction of the filtered modeled signal is called the residue.

The numbers which describe the intensity and frequency of the buzz, the formants, and the residue signal, can be stored or transmitted somewhere else. LPC synthesizes the speech signal by reversing the process: use the buzz parameters and the residue to create a source signal, use the formants to create a filter (which represents the tube), and run the source through the filter, resulting in speech.

Because speech signals vary with time, this process is done on short chunks of the speech signal, which are called frames; generally, 30 to 50 frames per second give an intelligible speech with good compression.

Early history

Linear prediction (signal estimation) goes back to at least 1940s when Norbert Wiener developed a mathematical theory for calculating the best filters and predictors for detecting signals hidden in noise.[3][4] Soon after Claude Shannon established a general theory of coding, work on predictive coding was done by C. Chapin Cutler,[5] Bernard M. Oliver[6] and Henry C. Harrison.[7] Peter Elias in 1955 published two papers on predictive coding of signals.[8][9]

Linear predictors were applied to speech analysis independently by Fumitada Itakura of Nagoya University and Shuzo Saito of Nippon Telegraph and Telephone in 1966 and in 1967 by Bishnu S. Atal, Manfred R. Schroeder and John Burg. Itakura and Saito described a statistical approach based on maximum likelihood estimation; Atal and Schroeder described an adaptive linear predictor approach; Burg outlined an approach based on principle of maximum entropy.[4][10][11][12]

In 1969, Itakura and Saito introduced method based on partial correlation (PARCOR), Glen Culler proposed real-time speech encoding, and Bishnu S. Atal presented an LPC speech coder at the Annual Meeting of the Acoustical Society of America. In 1971, realtime LPC using 16-bit LPC hardware was demonstrated by Philco-Ford; four units were sold.[13] LPC technology was advanced by Bishnu Atal and Manfred Schroeder during the 1970s–1980s.[13] In 1978, Atal and Vishwanath et al. of BBN developed the first variable-rate LPC algorithm.[13] The same year, Atal and Manfred R. Schroeder at Bell Labs proposed an LPC speech codec called adaptive predictive coding, which used a psychoacoustic coding algorithm exploiting the masking properties of the human ear.[14][15] This later became the basis for the perceptual coding technique used by the MP3 audio compression format, introduced in 1993.[14] Code-excited linear prediction (CELP) was developed by Schroeder and Atal in 1985.[16]

LPC is the basis for voice-over-IP (VoIP) technology.[13] In 1972, Bob Kahn of ARPA, with Jim Forgie (Lincoln Laboratory, LL) and Dave Walden (BBN Technologies), started the first developments in packetized speech, which would eventually lead to voice-over-IP technology. In 1973, according to Lincoln Laboratory informal history, the first real-time 2400 bit/s LPC was implemented by Ed Hofstetter. In 1974, the first real-time two-way LPC packet speech communication was accomplished over the ARPANET at 3500 bit/s between Culler-Harrison and Lincoln Laboratory. In 1976, the first LPC conference took place over the ARPANET using the Network Voice Protocol, between Culler-Harrison, ISI, SRI, and LL at 3500 bit/s.[citation needed]

LPC coefficient representations

LPC is frequently used for transmitting spectral envelope information, and as such it has to be tolerant of transmission errors. Transmission of the filter coefficients directly (see linear prediction for a definition of coefficients) is undesirable, since they are very sensitive to errors. In other words, a very small error can distort the whole spectrum, or worse, a small error might make the prediction filter unstable.

There are more advanced representations such as log area ratios (LAR), line spectral pairs (LSP) decomposition and reflection coefficients. Of these, especially LSP decomposition has gained popularity since it ensures the stability of the predictor, and spectral errors are local for small coefficient deviations.

Applications

LPC is the most widely used method in speech coding and speech synthesis.[17] It is generally used for speech analysis and resynthesis. It is used as a form of voice compression by phone companies, such as in the GSM standard, for example. It is also used for secure wireless, where voice must be digitized, encrypted and sent over a narrow voice channel; an early example of this is the US government's Navajo I.

LPC synthesis can be used to construct vocoders where musical instruments are used as an excitation signal to the time-varying filter estimated from a singer's speech. This is somewhat popular in electronic music. Paul Lansky made the well-known computer music piece notjustmoreidlechatter using linear predictive coding. A 10th-order LPC was used in the popular 1980s Speak & Spell educational toy.

LPC predictors are used in Shorten, MPEG-4 ALS, FLAC, SILK audio codec, and other lossless audio codecs.

LPC received some attention as a tool for use in the tonal analysis of violins and other stringed musical instruments.[18]

See also

References

  1. ^ Deng, Li; Douglas O'Shaughnessy (2003). Speech processing: a dynamic and optimization-oriented approach. Marcel Dekker. pp. 41–48. ISBN 978-0-8247-4040-5.
  2. ^ Beigi, Homayoon (2011). Fundamentals of Speaker Recognition. Berlin: Springer-Verlag. ISBN 978-0-387-77591-3.
  3. ^ B.S. Atal (2006). "The history of linear prediction". IEEE Signal Processing Magazine. 23 (2): 154–161. Bibcode:2006ISPM...23..154A. doi:10.1109/MSP.2006.1598091. S2CID 15601493.
  4. ^ a b Y. Sasahira; S. Hashimoto (1995). "Voice pitch changing by Linear Predictive Coding Method to keep the Singer's Personal Timbre" (PDF). {{cite journal}}: Cite journal requires |journal= (help)CS1 maint: uses authors parameter (link)
  5. ^ US 2605361, C. C. Cutler, "Differential quantization of communication signals", published 1952-07-29 
  6. ^ B. M. Oliver (1952). "Efficient coding". 31 (4). Nokia Bell Labs: 724–750. {{cite journal}}: Cite journal requires |journal= (help)
  7. ^ H. C. Harrison (1952). "Experiments with linear prediction in television". 31. Bell System Technical Journal: 764–783. {{cite journal}}: Cite journal requires |journal= (help)
  8. ^ P. Elias (1955). "Predictive coding I". IT-1 no. 1. IRE Trans. Inform.Theory: 16–24. {{cite journal}}: Cite journal requires |journal= (help)
  9. ^ P. Elias (1955). "Predictive coding II". IT-1 no. 1. IRE Trans. Inform. Theory: 24–33. {{cite journal}}: Cite journal requires |journal= (help)
  10. ^ S. Saito; F. Itakura (Jan 1967). "Theoretical consideration of the statistical optimum recognition of the spectral density of speech". J. Acoust. Soc.Japan. {{cite journal}}: Cite journal requires |journal= (help)CS1 maint: uses authors parameter (link)
  11. ^ B.S. Atal; M.R. Schroeder (1967). "Predictive coding of speech". Conf. Communications and Proc. {{cite journal}}: Cite journal requires |journal= (help)CS1 maint: uses authors parameter (link)
  12. ^ J.P. Burg (1967). "Maximum Entropy Spectral Analysis". Proceedings of 37th Meeting, Society of Exploration Geophysics, Oklahoma City. {{cite journal}}: Cite journal requires |journal= (help)
  13. ^ a b c d Gray, Robert M. (2010). "A History of Realtime Digital Speech on Packet Networks: Part II of Linear Predictive Coding and the Internet Protocol" (PDF). Found. Trends Signal Process. 3 (4): 203–303. doi:10.1561/2000000036. ISSN 1932-8346. Archived (PDF) from the original on 2022-10-09.
  14. ^ a b Schroeder, Manfred R. (2014). "Bell Laboratories". Acoustics, Information, and Communication: Memorial Volume in Honor of Manfred R. Schroeder. Springer. p. 388. ISBN 9783319056609.
  15. ^ Atal, B.; Schroeder, M. (1978). "Predictive coding of speech signals and subjective error criteria". ICASSP '78. IEEE International Conference on Acoustics, Speech, and Signal Processing. 3: 573–576. doi:10.1109/ICASSP.1978.1170564.
  16. ^ Schroeder, Manfred R.; Atal, Bishnu S. (1985). "Code-excited linear prediction (CELP): High-quality speech at very low bit rates". ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing. 10: 937–940. doi:10.1109/ICASSP.1985.1168147. S2CID 14803427.
  17. ^ Gupta, Shipra (May 2016). (PDF). International Journal of Advanced Research in Computer Science and Software Engineering. 6 (5): 805-810 (806). ISSN 2277-128X. S2CID 212485331. Archived from the original (PDF) on 2019-10-18. Retrieved 18 October 2019.
  18. ^ Tai, Hwan-Ching; Chung, Dai-Ting (June 14, 2012). "Stradivari Violins Exhibit Formant Frequencies Resembling Vowels Produced by Females". Savart Journal. 1 (2).
  • Robert M. Gray, IEEE Signal Processing Society, Distinguished Lecturer Program

Further reading

  • O'Shaughnessy, D. (1988). "Linear predictive coding". IEEE Potentials. 7 (1): 29–32. doi:10.1109/45.1890. S2CID 12786562.
  • Bundy, Alan; Wallen, Lincoln (1984). A Generalisation of the Glivenko-Cantelli Theorem. Symbolic Computation. p. 61. doi:10.1007/978-3-642-96868-6_123. ISBN 978-3-540-13938-6.
  • El-Jaroudi, Amro (2003). "Linear Predictive Coding". Wiley Encyclopedia of Telecommunications. Encyclopedia of Telecommunications. doi:10.1002/0471219282.eot155. ISBN 978-0471219286.

External links

  • real-time LPC analysis/synthesis learning software
  • 30 years later Dr Richard Wiggins Talks Speak & Spell development

linear, predictive, coding, method, used, mostly, audio, signal, processing, speech, processing, representing, spectral, envelope, digital, signal, speech, compressed, form, using, information, linear, predictive, model, most, widely, used, method, speech, cod. Linear predictive coding LPC is a method used mostly in audio signal processing and speech processing for representing the spectral envelope of a digital signal of speech in compressed form using the information of a linear predictive model 1 2 LPC is the most widely used method in speech coding and speech synthesis It is a powerful speech analysis technique and a useful method for encoding good quality speech at a low bit rate Contents 1 Overview 2 Early history 3 LPC coefficient representations 4 Applications 5 See also 6 References 7 Further reading 8 External linksOverview EditLPC starts with the assumption that a speech signal is produced by a buzzer at the end of a tube for voiced sounds with occasional added hissing and popping sounds for voiceless sounds such as sibilants and plosives Although apparently crude this Source filter model is actually a close approximation of the reality of speech production The glottis the space between the vocal folds produces the buzz which is characterized by its intensity loudness and frequency pitch The vocal tract the throat and mouth forms the tube which is characterized by its resonances these resonances give rise to formants or enhanced frequency bands in the sound produced Hisses and pops are generated by the action of the tongue lips and throat during sibilants and plosives LPC analyzes the speech signal by estimating the formants removing their effects from the speech signal and estimating the intensity and frequency of the remaining buzz The process of removing the formants is called inverse filtering and the remaining signal after the subtraction of the filtered modeled signal is called the residue The numbers which describe the intensity and frequency of the buzz the formants and the residue signal can be stored or transmitted somewhere else LPC synthesizes the speech signal by reversing the process use the buzz parameters and the residue to create a source signal use the formants to create a filter which represents the tube and run the source through the filter resulting in speech Because speech signals vary with time this process is done on short chunks of the speech signal which are called frames generally 30 to 50 frames per second give an intelligible speech with good compression Early history EditLinear prediction signal estimation goes back to at least 1940s when Norbert Wiener developed a mathematical theory for calculating the best filters and predictors for detecting signals hidden in noise 3 4 Soon after Claude Shannon established a general theory of coding work on predictive coding was done by C Chapin Cutler 5 Bernard M Oliver 6 and Henry C Harrison 7 Peter Elias in 1955 published two papers on predictive coding of signals 8 9 Linear predictors were applied to speech analysis independently by Fumitada Itakura of Nagoya University and Shuzo Saito of Nippon Telegraph and Telephone in 1966 and in 1967 by Bishnu S Atal Manfred R Schroeder and John Burg Itakura and Saito described a statistical approach based on maximum likelihood estimation Atal and Schroeder described an adaptive linear predictor approach Burg outlined an approach based on principle of maximum entropy 4 10 11 12 In 1969 Itakura and Saito introduced method based on partial correlation PARCOR Glen Culler proposed real time speech encoding and Bishnu S Atal presented an LPC speech coder at the Annual Meeting of the Acoustical Society of America In 1971 realtime LPC using 16 bit LPC hardware was demonstrated by Philco Ford four units were sold 13 LPC technology was advanced by Bishnu Atal and Manfred Schroeder during the 1970s 1980s 13 In 1978 Atal and Vishwanath et al of BBN developed the first variable rate LPC algorithm 13 The same year Atal and Manfred R Schroeder at Bell Labs proposed an LPC speech codec called adaptive predictive coding which used a psychoacoustic coding algorithm exploiting the masking properties of the human ear 14 15 This later became the basis for the perceptual coding technique used by the MP3 audio compression format introduced in 1993 14 Code excited linear prediction CELP was developed by Schroeder and Atal in 1985 16 LPC is the basis for voice over IP VoIP technology 13 In 1972 Bob Kahn of ARPA with Jim Forgie Lincoln Laboratory LL and Dave Walden BBN Technologies started the first developments in packetized speech which would eventually lead to voice over IP technology In 1973 according to Lincoln Laboratory informal history the first real time 2400 bit s LPC was implemented by Ed Hofstetter In 1974 the first real time two way LPC packet speech communication was accomplished over the ARPANET at 3500 bit s between Culler Harrison and Lincoln Laboratory In 1976 the first LPC conference took place over the ARPANET using the Network Voice Protocol between Culler Harrison ISI SRI and LL at 3500 bit s citation needed LPC coefficient representations EditLPC is frequently used for transmitting spectral envelope information and as such it has to be tolerant of transmission errors Transmission of the filter coefficients directly see linear prediction for a definition of coefficients is undesirable since they are very sensitive to errors In other words a very small error can distort the whole spectrum or worse a small error might make the prediction filter unstable There are more advanced representations such as log area ratios LAR line spectral pairs LSP decomposition and reflection coefficients Of these especially LSP decomposition has gained popularity since it ensures the stability of the predictor and spectral errors are local for small coefficient deviations Applications EditLPC is the most widely used method in speech coding and speech synthesis 17 It is generally used for speech analysis and resynthesis It is used as a form of voice compression by phone companies such as in the GSM standard for example It is also used for secure wireless where voice must be digitized encrypted and sent over a narrow voice channel an early example of this is the US government s Navajo I LPC synthesis can be used to construct vocoders where musical instruments are used as an excitation signal to the time varying filter estimated from a singer s speech This is somewhat popular in electronic music Paul Lansky made the well known computer music piece notjustmoreidlechatter using linear predictive coding More Than Idle Chatter A 10th order LPC was used in the popular 1980s Speak amp Spell educational toy LPC predictors are used in Shorten MPEG 4 ALS FLAC SILK audio codec and other lossless audio codecs LPC received some attention as a tool for use in the tonal analysis of violins and other stringed musical instruments 18 See also EditAkaike information criterion Audio compression Code excited linear prediction CELP FS 1015 FS 1016 Generalized filtering Linear prediction Linear predictive analysis Pitch estimation Warped linear predictive codingReferences Edit Deng Li Douglas O Shaughnessy 2003 Speech processing a dynamic and optimization oriented approach Marcel Dekker pp 41 48 ISBN 978 0 8247 4040 5 Beigi Homayoon 2011 Fundamentals of Speaker Recognition Berlin Springer Verlag ISBN 978 0 387 77591 3 B S Atal 2006 The history of linear prediction IEEE Signal Processing Magazine 23 2 154 161 Bibcode 2006ISPM 23 154A doi 10 1109 MSP 2006 1598091 S2CID 15601493 a b Y Sasahira S Hashimoto 1995 Voice pitch changing by Linear Predictive Coding Method to keep the Singer s Personal Timbre PDF a href Template Cite journal html title Template Cite journal cite journal a Cite journal requires journal help CS1 maint uses authors parameter link US 2605361 C C Cutler Differential quantization of communication signals published 1952 07 29 B M Oliver 1952 Efficient coding 31 4 Nokia Bell Labs 724 750 a href Template Cite journal html title Template Cite journal cite journal a Cite journal requires journal help H C Harrison 1952 Experiments with linear prediction in television 31 Bell System Technical Journal 764 783 a href Template Cite journal html title Template Cite journal cite journal a Cite journal requires journal help P Elias 1955 Predictive coding I IT 1 no 1 IRE Trans Inform Theory 16 24 a href Template Cite journal html title Template Cite journal cite journal a Cite journal requires journal help P Elias 1955 Predictive coding II IT 1 no 1 IRE Trans Inform Theory 24 33 a href Template Cite journal html title Template Cite journal cite journal a Cite journal requires journal help S Saito F Itakura Jan 1967 Theoretical consideration of the statistical optimum recognition of the spectral density of speech J Acoust Soc Japan a href Template Cite journal html title Template Cite journal cite journal a Cite journal requires journal help CS1 maint uses authors parameter link B S Atal M R Schroeder 1967 Predictive coding of speech Conf Communications and Proc a href Template Cite journal html title Template Cite journal cite journal a Cite journal requires journal help CS1 maint uses authors parameter link J P Burg 1967 Maximum Entropy Spectral Analysis Proceedings of 37th Meeting Society of Exploration Geophysics Oklahoma City a href Template Cite journal html title Template Cite journal cite journal a Cite journal requires journal help a b c d Gray Robert M 2010 A History of Realtime Digital Speech on Packet Networks Part II of Linear Predictive Coding and the Internet Protocol PDF Found Trends Signal Process 3 4 203 303 doi 10 1561 2000000036 ISSN 1932 8346 Archived PDF from the original on 2022 10 09 a b Schroeder Manfred R 2014 Bell Laboratories Acoustics Information and Communication Memorial Volume in Honor of Manfred R Schroeder Springer p 388 ISBN 9783319056609 Atal B Schroeder M 1978 Predictive coding of speech signals and subjective error criteria ICASSP 78 IEEE International Conference on Acoustics Speech and Signal Processing 3 573 576 doi 10 1109 ICASSP 1978 1170564 Schroeder Manfred R Atal Bishnu S 1985 Code excited linear prediction CELP High quality speech at very low bit rates ICASSP 85 IEEE International Conference on Acoustics Speech and Signal Processing 10 937 940 doi 10 1109 ICASSP 1985 1168147 S2CID 14803427 Gupta Shipra May 2016 Application of MFCC in Text Independent Speaker Recognition PDF International Journal of Advanced Research in Computer Science and Software Engineering 6 5 805 810 806 ISSN 2277 128X S2CID 212485331 Archived from the original PDF on 2019 10 18 Retrieved 18 October 2019 Tai Hwan Ching Chung Dai Ting June 14 2012 Stradivari Violins Exhibit Formant Frequencies Resembling Vowels Produced by Females Savart Journal 1 2 Robert M Gray IEEE Signal Processing Society Distinguished Lecturer ProgramFurther reading EditO Shaughnessy D 1988 Linear predictive coding IEEE Potentials 7 1 29 32 doi 10 1109 45 1890 S2CID 12786562 Bundy Alan Wallen Lincoln 1984 A Generalisation of the Glivenko Cantelli Theorem Symbolic Computation p 61 doi 10 1007 978 3 642 96868 6 123 ISBN 978 3 540 13938 6 El Jaroudi Amro 2003 Linear Predictive Coding Wiley Encyclopedia of Telecommunications Encyclopedia of Telecommunications doi 10 1002 0471219282 eot155 ISBN 978 0471219286 External links Editreal time LPC analysis synthesis learning software 30 years later Dr Richard Wiggins Talks Speak amp Spell development Retrieved from https en wikipedia org w index php title Linear predictive coding amp oldid 1134937692, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.