fbpx
Wikipedia

Lexicostatistics

Lexicostatistics is a method of comparative linguistics that involves comparing the percentage of lexical cognates between languages to determine their relationship. Lexicostatistics is related to the comparative method but does not reconstruct a proto-language. It is to be distinguished from glottochronology, which attempts to use lexicostatistical methods to estimate the length of time since two or more languages diverged from a common earlier proto-language. This is merely one application of lexicostatistics, however; other applications of it may not share the assumption of a constant rate of change for basic lexical items.

The term "lexicostatistics" is misleading in that mathematical equations are used but not statistics. Other features of a language may be used other than the lexicon, though this is unusual. Whereas the comparative method used shared identified innovations to determine sub-groups, lexicostatistics does not identify these. Lexicostatistics is a distance-based method, whereas the comparative method considers language characters directly. The lexicostatistics method is a simple and fast technique relative to the comparative method but has limitations (discussed below). It can be validated by cross-checking the trees produced by both methods.

History Edit

Lexicostatistics was developed by Morris Swadesh in a series of articles in the 1950s, based on earlier ideas.[1][2][3] The concept's first known use was by Dumont d'Urville in 1834 who compared various "Oceanic" languages and proposed a method for calculating a coefficient of relationship. Hymes (1960) and Embleton (1986) both review the history of lexicostatistics.[4][5]

Method Edit

Create word list Edit

The aim is to generate a list of universally used meanings (hand, mouth, sky, I). Words are then collected for these meaning slots for each language being considered. Swadesh reduced a larger set of meanings down to 200 originally. He later found that it was necessary to reduce it further but that he could include some meanings that were not in his original list, giving his later 100-item list. The Swadesh list in Wiktionary gives the total 207 meanings in a number of languages. Alternative lists that apply more rigorous criteria have been generated, e.g. the Dolgopolsky list and the Leipzig–Jakarta list, as well as lists with a more specific scope; for example, Dyen, Kruskal and Black have 200 meanings for 84 Indo-European languages in digital form.[6]

Determine cognacies Edit

A trained and experienced linguist is needed to make cognacy decisions. However, the decisions may need to be refined as the state of knowledge increases. However, lexicostatistics does not rely on all the decisions being correct. For each pair of words (in different languages) in this list, the cognacy of a form could be positive, negative or indeterminate. Sometimes a language has multiple words for one meaning, e.g. small and little for not big.

Calculate lexicostatistic percentages Edit

This percentage is related to the proportion of meanings for a particular language pair that are cognate, i.e. relative to the total without indeterminacy. This value is entered into an N×N table of distances, where N is the number of languages being compared. When completed, this table is half-filled in triangular form. The higher the proportion of cognacy the closer the languages are related.

Create family tree Edit

Creation of the language tree is based solely on the table found above. Various sub-grouping methods can be used but that adopted by Dyen, Kruskal and Black was:

  • all lists are placed in a pool
  • the two closest members are removed and form a nucleus which is placed in the pool
  • this step is repeated
  • under certain conditions a nucleus becomes a group
  • this is repeated until the pool only contains one group.

Calculations have to be of nucleus and group lexical percentages.

Applications Edit

A leading exponent of lexicostatistics application has been Isidore Dyen.[7][8][9][10] He used lexicostatistics to classify Austronesian languages[11] as well as Indo-European ones.[6] A major study of the latter was reported by Dyen, Kruskal and Black (1992).[6] Studies have also been carried out on Amerindian and African languages.

Pama-Nyungan Edit

The problem of internal branching within the Pama-Nyungan language family has been a long-standing issue for Australianist linguistics, and general consensus held that internal connections between the 25+ different subgroups of Pama-Nyungan were either impossible to reconstruct or that the subgroups were not in fact genetically related at all.[12] In 2012, Claire Bowern and Quentin Atkinson published the results from their application of computational phylogenetic methods on 194 doculects representing all major subgroups and isolates of Pama-Nyungan.[13] Their model "recovered" many of the branches and divisions that had erstwhile been proposed and accepted by many other Australianists, while also providing some insight into the more problematic branches, such as Paman (which is complicated by the lack of data) and Ngumpin-Yapa (where the genetic picture is obscured by very high rates of borrowing between languages). Their dataset forms the largest of its kind for a hunter-gatherer language family, and the second largest overall after Austronesian (Greenhill et al. 2008 2018-12-19 at the Wayback Machine). They conclude that Pama-Nyungan languages are in fact not exceptional to lexicostatistical methods, which have successfully been applied to other language families of the world.

Criticisms Edit

People such as Hoijer (1956) have showed that there were difficulties in finding equivalents to the meaning items while many have found it necessary to modify Swadesh's lists.[14] Gudschinsky (1956) questioned whether it was possible to obtain a universal list.[15]

Factors such as borrowing, tradition and taboo can skew the results, as with other methods. Sometimes lexicostatistics has been used with lexical similarity being used rather than cognacy to find resemblances. This is then equivalent to mass comparison.

The choice of meaning slots is subjective, as is the choice of synonyms.

Improved methods Edit

Some of the modern computational statistical hypothesis testing methods can be regarded as improvements of lexicostatistics in that they use similar word lists and distance measures.

See also Edit

References Edit

  1. ^ Swadesh, Morris (1955). "Towards greater accuracy in lexicostatistical dating". International Journal of American Linguistics. 21 (2): 121–137. doi:10.1086/464321. S2CID 144581963.
  2. ^ Swadesh, Morris (1952). "Lexicostatistical dating of prehistoric ethnic contacts". Proceedings of the American Philosophical Society. 96: 452–463.
  3. ^ Swadesh, Morris (1950). "Salish internal relationships". International Journal of American Linguistics. 16 (4): 157–167. doi:10.1086/464084. S2CID 145122561.
  4. ^ Hymes, Dell (1960). "Lexicostatistics so far". Current Anthropology. 1 (1): 3–44. doi:10.1086/200074. S2CID 144569209.
  5. ^ Embleton, Sheila (1986). Statistics in Historical Linguistics. Bochum.
  6. ^ a b c Dyen, Isidore; Kruskal, Joseph; Black, Paul (1992). "An Indoeuropean Classification, a Lexicostatistical Experiment". Transactions of the American Philosophical Society. 82 (5): iii–132. doi:10.2307/1006517. JSTOR 1006517.
  7. ^ Dyen, Isidore (1962). "The lexicostatistically determined relationship of a language group". International Journal of American Linguistics. 28 (3): 153–161. doi:10.1086/464687. S2CID 143070513.
  8. ^ Dyen, Isidore (1963). "Lexicostatistically determined borrowing and taboo". Language. 39 (1): 60–66. doi:10.2307/410762. JSTOR 410762.
  9. ^ Dyen, Isidore, ed. (1973). Lexicostatistics in Genetic Linguistics. The Hague: Mouton.
  10. ^ Dyen, Isidore (1975). Linguistic Subgrouping and Lexicostatistics. The Hague: Mouton.
  11. ^ Dyen, Isidore (1965). "A lexicostatistical classification of the Austronesian languages". International Journal of American Linguistics. 19.
  12. ^ Dixon, Robert M.W. (2002). Australian languages: their nature and development. Cambridge University Press. pp. 48, 53. Australia provides a prototypical instance of a linguistic area. It has considerable time-depth, fairly uniform terrain leading to ease of interaction and communication, a fair proportion of reciprocal exogamous marriages, rampant multilingualism, and an open attitude to borrowing ... There is a basic uniformity to Australian languages which is the natural result of a long period of diffusion. Although no justification had been provided for 'Pama-Nyungan', it came to be accepted. People accepted it because it was accepted—as a species of belief. ... It is clear that 'Pama-Nyungan' cannot be supported as a genetic group. Nor is it a useful typological grouping.
  13. ^ Bowern, Claire; Atkinson, Quentin (2012). "Computational phylogenetics and the internal structure of Pama-Nyungan". Language. 88 (4): 817–845. doi:10.1353/lan.2012.0081. hdl:1885/61360. S2CID 4375648.
  14. ^ Hoijer, Harry (1956). "Lexicostatistics: a critique". Language. 32 (1): 49–60. doi:10.2307/410652. JSTOR 410652.
  15. ^ Gudschinsky, Sarah (1956). "The ABCs of lexicostatistics (glottochronology)". Word. 12 (2): 175–210. doi:10.1080/00437956.1956.11659599.

Further reading Edit

  • Dobson, Annette (1969). Lexicostatistical Grouping. Anthropological Linguistics 7, 216-221.
  • Dobson, Annette and Black, Paul (1979). Multidimensional Scaling of some Lexicostatistical Data. Mathematical Scientist 1979/4, 55-61.
  • McMahon, April and McMahon, Robert (2005). Language Classification by Numbers. Oxford University Press.
  • Sankoff, David (1970). "On the Rate of Replacement of Word-Meaning Relationships." Language 46.564-569.
  • Wittmann, Henri (1969). "A lexico-statistic inquiry into the diachrony of Hittite." Indogermanische Forschungen 74.1-10.[1]
  • Wittmann, Henri (1973). "The lexicostatistical classification of the French-based Creole languages." Lexicostatistics in genetic linguistics: Proceedings of the Yale conference, April 3–4, 1971, dir. Isidore Dyen, 89-99. La Haye: Mouton.[2]

External links Edit

  • The Global Lexicostatistical Database, part of the Evolution of Human Languages project
  • A simplified explanation of the difference between glottochronology and lexicostatistics.

lexicostatistics, this, article, includes, list, general, references, lacks, sufficient, corresponding, inline, citations, please, help, improve, this, article, introducing, more, precise, citations, august, 2014, learn, when, remove, this, template, message, . This article includes a list of general references but it lacks sufficient corresponding inline citations Please help to improve this article by introducing more precise citations August 2014 Learn how and when to remove this template message Lexicostatistics is a method of comparative linguistics that involves comparing the percentage of lexical cognates between languages to determine their relationship Lexicostatistics is related to the comparative method but does not reconstruct a proto language It is to be distinguished from glottochronology which attempts to use lexicostatistical methods to estimate the length of time since two or more languages diverged from a common earlier proto language This is merely one application of lexicostatistics however other applications of it may not share the assumption of a constant rate of change for basic lexical items The term lexicostatistics is misleading in that mathematical equations are used but not statistics Other features of a language may be used other than the lexicon though this is unusual Whereas the comparative method used shared identified innovations to determine sub groups lexicostatistics does not identify these Lexicostatistics is a distance based method whereas the comparative method considers language characters directly The lexicostatistics method is a simple and fast technique relative to the comparative method but has limitations discussed below It can be validated by cross checking the trees produced by both methods Contents 1 History 2 Method 2 1 Create word list 2 2 Determine cognacies 2 3 Calculate lexicostatistic percentages 2 4 Create family tree 3 Applications 3 1 Pama Nyungan 4 Criticisms 5 Improved methods 6 See also 7 References 8 Further reading 9 External linksHistory EditLexicostatistics was developed by Morris Swadesh in a series of articles in the 1950s based on earlier ideas 1 2 3 The concept s first known use was by Dumont d Urville in 1834 who compared various Oceanic languages and proposed a method for calculating a coefficient of relationship Hymes 1960 and Embleton 1986 both review the history of lexicostatistics 4 5 Method EditCreate word list Edit The aim is to generate a list of universally used meanings hand mouth sky I Words are then collected for these meaning slots for each language being considered Swadesh reduced a larger set of meanings down to 200 originally He later found that it was necessary to reduce it further but that he could include some meanings that were not in his original list giving his later 100 item list The Swadesh list in Wiktionary gives the total 207 meanings in a number of languages Alternative lists that apply more rigorous criteria have been generated e g the Dolgopolsky list and the Leipzig Jakarta list as well as lists with a more specific scope for example Dyen Kruskal and Black have 200 meanings for 84 Indo European languages in digital form 6 Determine cognacies Edit A trained and experienced linguist is needed to make cognacy decisions However the decisions may need to be refined as the state of knowledge increases However lexicostatistics does not rely on all the decisions being correct For each pair of words in different languages in this list the cognacy of a form could be positive negative or indeterminate Sometimes a language has multiple words for one meaning e g small and little for not big Calculate lexicostatistic percentages Edit This percentage is related to the proportion of meanings for a particular language pair that are cognate i e relative to the total without indeterminacy This value is entered into an N N table of distances where N is the number of languages being compared When completed this table is half filled in triangular form The higher the proportion of cognacy the closer the languages are related Create family tree Edit Creation of the language tree is based solely on the table found above Various sub grouping methods can be used but that adopted by Dyen Kruskal and Black was all lists are placed in a pool the two closest members are removed and form a nucleus which is placed in the pool this step is repeated under certain conditions a nucleus becomes a group this is repeated until the pool only contains one group Calculations have to be of nucleus and group lexical percentages Applications EditA leading exponent of lexicostatistics application has been Isidore Dyen 7 8 9 10 He used lexicostatistics to classify Austronesian languages 11 as well as Indo European ones 6 A major study of the latter was reported by Dyen Kruskal and Black 1992 6 Studies have also been carried out on Amerindian and African languages Pama Nyungan Edit The problem of internal branching within the Pama Nyungan language family has been a long standing issue for Australianist linguistics and general consensus held that internal connections between the 25 different subgroups of Pama Nyungan were either impossible to reconstruct or that the subgroups were not in fact genetically related at all 12 In 2012 Claire Bowern and Quentin Atkinson published the results from their application of computational phylogenetic methods on 194 doculects representing all major subgroups and isolates of Pama Nyungan 13 Their model recovered many of the branches and divisions that had erstwhile been proposed and accepted by many other Australianists while also providing some insight into the more problematic branches such as Paman which is complicated by the lack of data and Ngumpin Yapa where the genetic picture is obscured by very high rates of borrowing between languages Their dataset forms the largest of its kind for a hunter gatherer language family and the second largest overall after Austronesian Greenhill et al 2008 Archived 2018 12 19 at the Wayback Machine They conclude that Pama Nyungan languages are in fact not exceptional to lexicostatistical methods which have successfully been applied to other language families of the world Criticisms EditPeople such as Hoijer 1956 have showed that there were difficulties in finding equivalents to the meaning items while many have found it necessary to modify Swadesh s lists 14 Gudschinsky 1956 questioned whether it was possible to obtain a universal list 15 Factors such as borrowing tradition and taboo can skew the results as with other methods Sometimes lexicostatistics has been used with lexical similarity being used rather than cognacy to find resemblances This is then equivalent to mass comparison The choice of meaning slots is subjective as is the choice of synonyms Improved methods EditSome of the modern computational statistical hypothesis testing methods can be regarded as improvements of lexicostatistics in that they use similar word lists and distance measures See also EditBasic English Cognate Comparative linguistics Comparative method Global Lexicostatistical Database Glottochronology Historical linguistics Indo European studies Intercontinental Dictionary Series Linguistic distance Mass lexical comparison Proto language Swadesh list Word listReferences Edit Swadesh Morris 1955 Towards greater accuracy in lexicostatistical dating International Journal of American Linguistics 21 2 121 137 doi 10 1086 464321 S2CID 144581963 Swadesh Morris 1952 Lexicostatistical dating of prehistoric ethnic contacts Proceedings of the American Philosophical Society 96 452 463 Swadesh Morris 1950 Salish internal relationships International Journal of American Linguistics 16 4 157 167 doi 10 1086 464084 S2CID 145122561 Hymes Dell 1960 Lexicostatistics so far Current Anthropology 1 1 3 44 doi 10 1086 200074 S2CID 144569209 Embleton Sheila 1986 Statistics in Historical Linguistics Bochum a b c Dyen Isidore Kruskal Joseph Black Paul 1992 An Indoeuropean Classification a Lexicostatistical Experiment Transactions of the American Philosophical Society 82 5 iii 132 doi 10 2307 1006517 JSTOR 1006517 Dyen Isidore 1962 The lexicostatistically determined relationship of a language group International Journal of American Linguistics 28 3 153 161 doi 10 1086 464687 S2CID 143070513 Dyen Isidore 1963 Lexicostatistically determined borrowing and taboo Language 39 1 60 66 doi 10 2307 410762 JSTOR 410762 Dyen Isidore ed 1973 Lexicostatistics in Genetic Linguistics The Hague Mouton Dyen Isidore 1975 Linguistic Subgrouping and Lexicostatistics The Hague Mouton Dyen Isidore 1965 A lexicostatistical classification of the Austronesian languages International Journal of American Linguistics 19 Dixon Robert M W 2002 Australian languages their nature and development Cambridge University Press pp 48 53 Australia provides a prototypical instance of a linguistic area It has considerable time depth fairly uniform terrain leading to ease of interaction and communication a fair proportion of reciprocal exogamous marriages rampant multilingualism and an open attitude to borrowing There is a basic uniformity to Australian languages which is the natural result of a long period of diffusion Although no justification had been provided for Pama Nyungan it came to be accepted People accepted it because it was accepted as a species of belief It is clear that Pama Nyungan cannot be supported as a genetic group Nor is it a useful typological grouping Bowern Claire Atkinson Quentin 2012 Computational phylogenetics and the internal structure of Pama Nyungan Language 88 4 817 845 doi 10 1353 lan 2012 0081 hdl 1885 61360 S2CID 4375648 Hoijer Harry 1956 Lexicostatistics a critique Language 32 1 49 60 doi 10 2307 410652 JSTOR 410652 Gudschinsky Sarah 1956 The ABCs of lexicostatistics glottochronology Word 12 2 175 210 doi 10 1080 00437956 1956 11659599 Further reading EditDobson Annette 1969 Lexicostatistical Grouping Anthropological Linguistics 7 216 221 Dobson Annette and Black Paul 1979 Multidimensional Scaling of some Lexicostatistical Data Mathematical Scientist 1979 4 55 61 McMahon April and McMahon Robert 2005 Language Classification by Numbers Oxford University Press Sankoff David 1970 On the Rate of Replacement of Word Meaning Relationships Language 46 564 569 Wittmann Henri 1969 A lexico statistic inquiry into the diachrony of Hittite Indogermanische Forschungen 74 1 10 1 Wittmann Henri 1973 The lexicostatistical classification of the French based Creole languages Lexicostatistics in genetic linguistics Proceedings of the Yale conference April 3 4 1971 dir Isidore Dyen 89 99 La Haye Mouton 2 External links Edit nbsp Look up lexicostatistics in Wiktionary the free dictionary The Global Lexicostatistical Database part of the Evolution of Human Languages project IE database A simplified explanation of the difference between glottochronology and lexicostatistics Retrieved from https en wikipedia org w index php title Lexicostatistics amp oldid 1173434775, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.