fbpx
Wikipedia

List of children's speech corpora

A child speech corpus is a speech corpus documenting first-language language acquisition. Such databases are used in the development of computer-assisted language learning systems and the characterization of children's speech at difference ages.[1] Children's speech varies not only by language, but also by region within a language. It can also be different for specific groups like autistic children, especially when emotion is considered. Thus different databases are needed for different populations. Corpora are available for American and British English as well as for many other European languages.[1][2][3]

Overview of Children's Speech Corpora

In the table below, the age range may be described in terms of school grades. "K" denotes "kindergarten" while "G" denotes "grade". For example, an age range of "K - G10" refers to speakers ranging from kindergarten age to grade 10.

This table is based on a paper from the Interspeech conference, 2016.[4] This online article is intended to provide an interactive table for readers and a place where information about children speech corpora that can be updated continuously by the speech research community.

Corpus Author Languages # Speakers # Utt. Duration Age Range Date Remarks
Boulder Learning—MyST Corpus (v0.4.0) [5] Cole et al.[6] English 1371 228,874 ~393h G3 - G5 2019 dialog interaction between a student and a virtual tutor on science topics; typically 20-40 minute (wall clock) duration of a session; roughly 49% of the utterances have been transcribed, and more being transcribed. volunteers encouraged. available free for research; flat $10K for commercial use.
CMU Kids Corpus [7] Eskenazi English 24M, 52F 5180 6 - 11 1997
CSLU Kids' Speech Corpus [8] Shobaki English 1100 1017 K - G10 2007
PF-STAR Children's Speech Corpus [9][10] Russell English, 158 ~14.5h 4 - 14 2006 word-level transcriptions
CALL-SLT [11] Rayner German 5000 2014
TBALL [12] Kazemgadeh English 256 5000 40h K - G4 2005 partially non-native speech
CASS_CHILD [13] Gao Mandarin 23 1 - 4 2012 phonetic transcriptions
CU Children's Read and Prompted Speech Corpus [14] Hagen English 663 ~100 K - G5 2001 consists of isolated words, sentences and short spontaneous story telling; word-level transcriptions
CU Story Corpus [14] Hagen English 106 5000 40h G3 - G5 2003 consists of story prompts and spontaneous spoken summary of the material; word-level transcriptions
Providence Corpus [15] Demuth English 6 363h 1 - 3 2006 mother-child spontaneous speech interactions; broad phonetic transcription
Lyon Corpus [16] Demuth French 4 185h 1 - 3 2007 mother-child spontaneous speech interactions; broad phonetic transcription
Demuth Sesotho Corpus [17] Demuth Sesotho 4 ~13250 98h 2 - 4 1992 family/peer spontaneous speech interactions; morphologically tagged
CHIEDE [18] Garrote Spanish 59 15444 ~8h 2008 spontaneous conversation, personal interviews, adult-child interaction; orthographic transcriptions; automatic phonological transcription
TIDIGITS [19] Leonard English 326 (101 children) 6 - 15 1993 mix of adult and child speakers
FAU Aibo Emotion Corpus Steidl German 51 9h 10 - 13 human-annotated with 11 emotion categories
Swedish NICE Corpus [20] Bell 5580 8 - 15 2005 consists of child-machine and adult-child interactions; orthographic transcriptions
SingaKids-Mandarin [4] Chen Mandarin 255 79,843 125h 7 - 12 2016 word and phone-level transcriptions; human-annotated proficiency ratings
CFSC[21] Pascual Filipino 57 ~8h 6-11 2012 consists of children's read speech; contains both good pronunciations and reading miscues; partially transcribed to word- and phoneme-levels

See also

References

  1. ^ a b Habernal, Ivan; Vaclav, Matousek (2013). Text, Speech, and Dialogue: 16th International Conference, TSD 2013, Pilsen, Czech Republic, September 1-5, 2013, Proceedings. Springer. p. 545. ISBN 9783642405853. Retrieved 11 December 2015.
  2. ^ Neustein, Amy (2014). Speech and Automata in Health Care. Walter de Gruyter. pp. 225–226. ISBN 9781614515159. Retrieved 11 December 2015.
  3. ^ Ronzhin, Andrey; Potapova, Rodmonga; Fakotakis, Nikos (2015). Speech and Computer: 17th International Conference, SPECOM 2015, Athens, Greece, September 20-24, 2015, Proceedings. Springer. pp. 144–145. ISBN 9783319231327. Retrieved 11 December 2015.
  4. ^ a b Nancy F. Chen, Rong Tong, Darren Wee, Peixuan Lee, Bin Ma and Haizhou Li. SingaKids-Mandarin: Speech Corpus of Singaporean Children Speaking Mandarin Chinese, in Proc. of Interspeech, 2016.
  5. ^ "MyST Corpus | Boulder Learning inc". Retrieved 2019-07-17.
  6. ^ "My Science Tutor and the MyST Corpus". ResearchGate. Retrieved 2019-07-17.
  7. ^ Maxine Eskenazi, Jack Mostow, and David Graff. The CMU Kids Corpus LDC97S63. Web Download. Philadelphia: Linguistic Data Consortium, 1997.
  8. ^ Khaldoun Shobaki, John-Paul Hosom, and Ronald Cole. CSLU: Kids' Speech Version 1.1 LDC2007S18. Web Download. Philadelphia: Linguistic Data Consortium, 2007.
  9. ^ Martin Russell. The PF-STAR British English Children's Speech Corpus. The Speech Ark Limited. 2006.
  10. ^ Anton Batliner, Mats Blomberg, Shona D'Arcy, Daniel Elenius, Diego Giuliani, Matteo Gerosa, Christian Hacker, Martin Russell, Stefan Steidl, Michael Wong. The PF STAR Children’s Speech Corpus. In Proc. of Interspeech, 2005.
  11. ^ Manny Rayner, Nikos Tsourakis, Claudia Baur, Pierrette Bouillon, Johanna Gerlach. CALL-SLT: A Spoken CALL System based on grammar and speech recognition. In Linguistic Issues in Language Technology, vol. 10, issue 2. 2014.
  12. ^ Abe Kazemzadeh, Hong You, Markus Iseli, Barbara Jones, Xiaodong Cui, Margaret Heritage, Patti Price, Elaine Anderson, Shrikanth Narayanan and Abeer Alwan. TBALL Data Collection: The Making of a Young Children's Speech Corpus, in Proc. of Interspeech, 2005.
  13. ^ Jun Gao, Aijun Li and Ziyu Xiong. Mandarin Multimedia Child Speech Corpus: CASS_CHILD in International Conference on Speech Database and Assessments (Oriental COCOSDA), 2012.
  14. ^ a b Andreas Hagen, Bryan Pellom and Ronald Cole. Children's Speech Recognition with Application to Interactive Books and Tutors in IEEE Workshop on Automatic Speech Recognition and Understanding, 2003.
  15. ^ Demuth, K., Culbertson, J. & Alter, J. 2006. Word-minimality, epenthesis, and coda licensing in the acquisition of English. Language & Speech, 49, 137-174.
  16. ^ Demuth, K. & A. Tremblay. 2007. Prosodically-conditioned variability in children's production of French determiners. Journal of Child Language, 34, 1-29.
  17. ^ Demuth, K. 1992. Acquisition of Sesotho. In D. Slobin (ed.), The Cross-Linguistic Study of Language Acquisition, vol 3, 557-638. Hillsdale, N.J.: Lawrence Erlbaum Associates.
  18. ^ Marta Garrote. CHIEDE: A Spontaneous Child Language Corpus of Spanish. Ph.D. thesis, Universidad Autónoma de Madrid, Spain. 2008.
  19. ^ R. Gary Leonard, and George Doddington. TIDIGITS LDC93S10. Web Download. Philadelphia: Linguistic Data Consortium, 1993.
  20. ^ Linda Bell, Johan Boyce, Joakim Gustafson, Mattias Heldner, Anders Lindström and Mats Wirén. The Swedish NICE Corpus - Spoken Dialogues between Children and Embodied Characters in a Computer Game Scenario, in Proc. of Eurospeech, 2005.
  21. ^ Pascual, R. M.; Guevara, R. C. L. (November 2012). "Developing a children's Filipino speech corpus for application in automatic detection of reading miscues and disfluencies". TENCON 2012 IEEE Region 10 Conference: 1–6. doi:10.1109/TENCON.2012.6412235. ISBN 978-1-4673-4824-9. S2CID 8795591.

list, children, speech, corpora, child, speech, corpus, speech, corpus, documenting, first, language, language, acquisition, such, databases, used, development, computer, assisted, language, learning, systems, characterization, children, speech, difference, ag. A child speech corpus is a speech corpus documenting first language language acquisition Such databases are used in the development of computer assisted language learning systems and the characterization of children s speech at difference ages 1 Children s speech varies not only by language but also by region within a language It can also be different for specific groups like autistic children especially when emotion is considered Thus different databases are needed for different populations Corpora are available for American and British English as well as for many other European languages 1 2 3 Overview of Children s Speech Corpora EditIn the table below the age range may be described in terms of school grades K denotes kindergarten while G denotes grade For example an age range of K G10 refers to speakers ranging from kindergarten age to grade 10 This table is based on a paper from the Interspeech conference 2016 4 This online article is intended to provide an interactive table for readers and a place where information about children speech corpora that can be updated continuously by the speech research community Corpus Author Languages Speakers Utt Duration Age Range Date RemarksBoulder Learning MyST Corpus v0 4 0 5 Cole et al 6 English 1371 228 874 393h G3 G5 2019 dialog interaction between a student and a virtual tutor on science topics typically 20 40 minute wall clock duration of a session roughly 49 of the utterances have been transcribed and more being transcribed volunteers encouraged available free for research flat 10K for commercial use CMU Kids Corpus 7 Eskenazi English 24M 52F 5180 6 11 1997CSLU Kids Speech Corpus 8 Shobaki English 1100 1017 K G10 2007PF STAR Children s Speech Corpus 9 10 Russell English 158 14 5h 4 14 2006 word level transcriptionsCALL SLT 11 Rayner German 5000 2014TBALL 12 Kazemgadeh English 256 5000 40h K G4 2005 partially non native speechCASS CHILD 13 Gao Mandarin 23 1 4 2012 phonetic transcriptionsCU Children s Read and Prompted Speech Corpus 14 Hagen English 663 100 K G5 2001 consists of isolated words sentences and short spontaneous story telling word level transcriptionsCU Story Corpus 14 Hagen English 106 5000 40h G3 G5 2003 consists of story prompts and spontaneous spoken summary of the material word level transcriptionsProvidence Corpus 15 Demuth English 6 363h 1 3 2006 mother child spontaneous speech interactions broad phonetic transcriptionLyon Corpus 16 Demuth French 4 185h 1 3 2007 mother child spontaneous speech interactions broad phonetic transcriptionDemuth Sesotho Corpus 17 Demuth Sesotho 4 13250 98h 2 4 1992 family peer spontaneous speech interactions morphologically taggedCHIEDE 18 Garrote Spanish 59 15444 8h 2008 spontaneous conversation personal interviews adult child interaction orthographic transcriptions automatic phonological transcriptionTIDIGITS 19 Leonard English 326 101 children 6 15 1993 mix of adult and child speakersFAU Aibo Emotion Corpus Steidl German 51 9h 10 13 human annotated with 11 emotion categoriesSwedish NICE Corpus 20 Bell 5580 8 15 2005 consists of child machine and adult child interactions orthographic transcriptionsSingaKids Mandarin 4 Chen Mandarin 255 79 843 125h 7 12 2016 word and phone level transcriptions human annotated proficiency ratingsCFSC 21 Pascual Filipino 57 8h 6 11 2012 consists of children s read speech contains both good pronunciations and reading miscues partially transcribed to word and phoneme levelsSee also EditComputer assisted language learning Language acquisition Language development Non native speech databaseReferences Edit a b Habernal Ivan Vaclav Matousek 2013 Text Speech and Dialogue 16th International Conference TSD 2013 Pilsen Czech Republic September 1 5 2013 Proceedings Springer p 545 ISBN 9783642405853 Retrieved 11 December 2015 Neustein Amy 2014 Speech and Automata in Health Care Walter de Gruyter pp 225 226 ISBN 9781614515159 Retrieved 11 December 2015 Ronzhin Andrey Potapova Rodmonga Fakotakis Nikos 2015 Speech and Computer 17th International Conference SPECOM 2015 Athens Greece September 20 24 2015 Proceedings Springer pp 144 145 ISBN 9783319231327 Retrieved 11 December 2015 a b Nancy F Chen Rong Tong Darren Wee Peixuan Lee Bin Ma and Haizhou Li SingaKids Mandarin Speech Corpus of Singaporean Children Speaking Mandarin Chinese in Proc of Interspeech 2016 MyST Corpus Boulder Learning inc Retrieved 2019 07 17 My Science Tutor and the MyST Corpus ResearchGate Retrieved 2019 07 17 Maxine Eskenazi Jack Mostow and David Graff The CMU Kids Corpus LDC97S63 Web Download Philadelphia Linguistic Data Consortium 1997 Khaldoun Shobaki John Paul Hosom and Ronald Cole CSLU Kids Speech Version 1 1 LDC2007S18 Web Download Philadelphia Linguistic Data Consortium 2007 Martin Russell The PF STAR British English Children s Speech Corpus The Speech Ark Limited 2006 Anton Batliner Mats Blomberg Shona D Arcy Daniel Elenius Diego Giuliani Matteo Gerosa Christian Hacker Martin Russell Stefan Steidl Michael Wong The PF STAR Children s Speech Corpus In Proc of Interspeech 2005 Manny Rayner Nikos Tsourakis Claudia Baur Pierrette Bouillon Johanna Gerlach CALL SLT A Spoken CALL System based on grammar and speech recognition In Linguistic Issues in Language Technology vol 10 issue 2 2014 Abe Kazemzadeh Hong You Markus Iseli Barbara Jones Xiaodong Cui Margaret Heritage Patti Price Elaine Anderson Shrikanth Narayanan and Abeer Alwan TBALL Data Collection The Making of a Young Children s Speech Corpus in Proc of Interspeech 2005 Jun Gao Aijun Li and Ziyu Xiong Mandarin Multimedia Child Speech Corpus CASS CHILD in International Conference on Speech Database and Assessments Oriental COCOSDA 2012 a b Andreas Hagen Bryan Pellom and Ronald Cole Children s Speech Recognition with Application to Interactive Books and Tutors in IEEE Workshop on Automatic Speech Recognition and Understanding 2003 Demuth K Culbertson J amp Alter J 2006 Word minimality epenthesis and coda licensing in the acquisition of English Language amp Speech 49 137 174 Demuth K amp A Tremblay 2007 Prosodically conditioned variability in children s production of French determiners Journal of Child Language 34 1 29 Demuth K 1992 Acquisition of Sesotho In D Slobin ed The Cross Linguistic Study of Language Acquisition vol 3 557 638 Hillsdale N J Lawrence Erlbaum Associates Marta Garrote CHIEDE A Spontaneous Child Language Corpus of Spanish Ph D thesis Universidad Autonoma de Madrid Spain 2008 R Gary Leonard and George Doddington TIDIGITS LDC93S10 Web Download Philadelphia Linguistic Data Consortium 1993 Linda Bell Johan Boyce Joakim Gustafson Mattias Heldner Anders Lindstrom and Mats Wiren The Swedish NICE Corpus Spoken Dialogues between Children and Embodied Characters in a Computer Game Scenario in Proc of Eurospeech 2005 Pascual R M Guevara R C L November 2012 Developing a children s Filipino speech corpus for application in automatic detection of reading miscues and disfluencies TENCON 2012 IEEE Region 10 Conference 1 6 doi 10 1109 TENCON 2012 6412235 ISBN 978 1 4673 4824 9 S2CID 8795591 Retrieved from https en wikipedia org w index php title List of children 27s speech corpora amp oldid 1059831314, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.