fbpx
Wikipedia

Lemma (morphology)

In morphology and lexicography, a lemma (pl.: lemmas or lemmata) is the canonical form,[1] dictionary form, or citation form of a set of word forms.[2] In English, for example, break, breaks, broke, broken and breaking are forms of the same lexeme, with break as the lemma by which they are indexed. Lexeme, in this context, refers to the set of all the inflected or alternating forms in the paradigm of a single word, and lemma refers to the particular form that is chosen by convention to represent the lexeme. Lemmas have special significance in highly inflected languages such as Arabic, Turkish, and Russian. The process of determining the lemma for a given lexeme is called lemmatisation. The lemma can be viewed as the chief of the principal parts, although lemmatisation is at least partly arbitrary.

Morphology edit

The form of a word that is chosen to serve as the lemma is usually the least marked form, but there are several exceptions such as the use of the infinitive for verbs in some languages.

For English, the citation form of a noun is the singular (and non-possessive) form: mouse rather than mice. For multiword lexemes that contain possessive adjectives or reflexive pronouns, the citation form uses a form of the indefinite pronoun one: do one's best, perjure oneself. In European languages with grammatical gender, the citation form of regular adjectives and nouns is usually the masculine singular.[citation needed] If the language also has cases, the citation form is often the masculine singular nominative.

For many languages, the citation form of a verb is the infinitive: French aller, German gehen, Hindustani जाना/جانا, Spanish ir. English verbs usually have an infinitive, which in its bare form (without the particle to) is its least marked (for example, break is chosen over to break, breaks, broke, breaking, and broken); for defective verbs with no infinitive the present tense is used (for example, must has only one form while shall has no infinitive, and both lemmas are their lexemes' present tense forms). For Latin, Ancient Greek, Modern Greek, and Bulgarian, the first person singular present tense is traditionally used, but some modern dictionaries use the infinitive instead (except for Bulgarian, which lacks infinitives; for contracted verbs in Ancient Greek, an uncontracted first person singular present tense is used to reveal the contract vowel: φιλέω philéō for φιλῶ philō "I love" [implying affection], ἀγαπάω agapáō for ἀγαπῶ agapō "I love" [implying regard]). Finnish dictionaries list verbs not under their root, but under the first infinitive, marked with -(t)a, -(t)ä.

For Japanese, the non-past (present and future) tense is used. For Arabic the third-person singular masculine of the past/perfect tense is the least-marked form and is used for entries in modern dictionaries. In older dictionaries, which are still commonly used, the triliteral of the word, either a verb or a noun, is used. This is similar to Hebrew, which also uses the third-person singular masculine perfect form, e.g. ברא bara' create, כפר kaphar deny. Georgian uses the verbal noun. For Korean, -da is attached to the stem.

In Tamil, an agglutinative language, the verb stem (which is also the imperative form - the least marked one) is often cited, e.g., இரு

In Irish, words are highly inflected by case (genitive, nominative, dative and vocative) and by their place within a sentence because of initial mutations. The noun cainteoir, the lemma for the noun meaning "speaker", has a variety of forms: chainteoir, gcainteoir, cainteora, chainteora, cainteoirí, chainteoirí and gcainteoirí.

Some phrases are cited in a sort of lemma: Carthago delenda est (literally, "Carthage must be destroyed") is a common way of citing Cato, but what he said was nearer to censeo Carthaginem esse delendam ("I hold Carthage to be in need of destruction").

Lexicography edit

In a dictionary, the lemma "go" represents the inflected forms "go", "goes", "going", "went", and "gone". The relationship between an inflected form and its lemma is usually denoted by an angle bracket, e.g., "went" < "go". Of course, the disadvantage of such simplifications is the inability to look up a declined or conjugated form of the word, but some dictionaries, like Webster's Dictionary, list "went". Multilingual dictionaries vary in how they deal with this issue: the Langenscheidt dictionary of German does not list ging (< gehen), but the Cassell does.

Lemmas or word stems are used often in corpus linguistics for determining word frequency. In that usage, the specific definition of "lemma" is flexible depending on the task it is being used for.

Pronunciation edit

A word may have different pronunciations, depending on its phonetic environment (the neighbouring sounds) or on the degree of stress in a sentence. An example of the latter is the weak and strong forms of certain English function words like some and but (pronounced /sʌm/, /bʌt/ when stressed but /s(ə)m/, /bət/ when unstressed). Dictionaries usually give the pronunciation used when the word is pronounced alone (its isolation form) and with stress, but they may also note common weak forms of pronunciation.

Difference between stem and lemma edit

The stem is the part of the word that never changes even when morphologically inflected; a lemma is the least marked form of the word. For example, from "produced", the lemma is "produce", but the stem is "produc-". This is because there are words such as production. and producing[3][failed verification] In linguistic analysis, the stem is defined more generally as the analyzed base form from which all inflected forms can be formed.[citation needed] When phonology is taken into account, the definition of the unchangeable part of the word is not useful, as can be seen in the phonological forms of the words in the preceding example: "produced" /prəˈdjst/ vs. "production" /prəˈdʌkʃən/.

Some lexemes have several stems but one lemma. For instance the verb "to go" has the stems "go" and "went" due to suppletion: the past tense was co-opted from a different verb, "to wend".

Headword edit

A headword, lemma, or catchword[4] is the word under which a set of related dictionary or encyclopaedia entries appears. The headword is used to locate the entry, and dictates its alphabetical position. Depending on the size and nature of the dictionary or encyclopedia, the entry may include alternative meanings of the word, its etymology, pronunciation and inflections, compound words or phrases that contain the headword, and encyclopedic information about the concepts represented by the word.

For example, the headword bread may contain the following (simplified) definitions:

Bread
(noun)
  • A common food made from the combination of flour, water and yeast
  • Money (slang)
(verb)
  • To coat in breadcrumbs
to know which side your bread is buttered to know how to act in your own best interests.

The Academic Dictionary of Lithuanian contains around 500,000 headwords. The Oxford English Dictionary (OED) has around 273,000 headwords along with 220,000 lemmas,[5] while Webster's Third New International Dictionary has about 470,000.[6] The Deutsches Wörterbuch (DWB), the largest lexicon of the German language, has around 330,000 headwords.[7] These values are cited by the dictionary makers and may not use exactly the same definition of a headword. In addition, headwords may not accurately reflect a dictionary's physical size. The OED and the DWB, for instance, include exhaustive historical reviews and exact citations from source documents not usually found in standard dictionaries.

The term 'lemma' comes from the practice in Greco-Roman antiquity of using the word to refer to the headwords of marginal glosses in scholia; for this reason, the Ancient Greek plural form is sometimes used, namely lemmata (Greek λῆμμα, pl. λήμματα).

See also edit

References edit

  1. ^ Zgusta, Ladislav (2006). Dolezal, Fredric F.M. (ed.). Lexicography then and now. p. 202. ISBN 3484391294. A minor... problem can arise when the canonical form of the headword, i.e. the form in which it is to be cited, is to be chosen.
  2. ^ Francis, W.N.; Kučera, H (1982). Frequency Analysis of English Usage: Lexicon and Usage. Boston: Houghton Mifflin.
  3. ^ "Natural Language Toolkit — NLTK 3.0 documentation". Nltk.org. 2015-09-05. Retrieved 2015-09-27.
  4. ^ Oxford English Dictionary, 3rd. edition, 2018, s.v., definition 5
  5. ^ "Glossary - Oxford English Dictionary". public.oed.com. Retrieved 3 October 2016.
  6. ^ "Mwunabridged". www.merriam-webster.com. Retrieved 3 October 2016.
  7. ^ The Deutsches Wörterbuch 2016-08-12 at the Wayback Machine at the BBAW, retrieved 22-June-2012.

External links edit

lemma, morphology, confused, with, lemma, psycholinguistics, lemma, mathematics, also, head, linguistics, other, uses, catchword, catchword, disambiguation, other, uses, lemma, lemma, disambiguation, morphology, lexicography, lemma, lemmas, lemmata, canonical,. Not to be confused with Lemma psycholinguistics or Lemma mathematics See also Head linguistics For other uses of Catchword see Catchword disambiguation For other uses of Lemma see Lemma disambiguation In morphology and lexicography a lemma pl lemmas or lemmata is the canonical form 1 dictionary form or citation form of a set of word forms 2 In English for example break breaks broke broken and breaking are forms of the same lexeme with break as the lemma by which they are indexed Lexeme in this context refers to the set of all the inflected or alternating forms in the paradigm of a single word and lemma refers to the particular form that is chosen by convention to represent the lexeme Lemmas have special significance in highly inflected languages such as Arabic Turkish and Russian The process of determining the lemma for a given lexeme is called lemmatisation The lemma can be viewed as the chief of the principal parts although lemmatisation is at least partly arbitrary Contents 1 Morphology 2 Lexicography 3 Pronunciation 4 Difference between stem and lemma 5 Headword 6 See also 7 References 8 External linksMorphology editThe form of a word that is chosen to serve as the lemma is usually the least marked form but there are several exceptions such as the use of the infinitive for verbs in some languages For English the citation form of a noun is the singular and non possessive form mouse rather than mice For multiword lexemes that contain possessive adjectives or reflexive pronouns the citation form uses a form of the indefinite pronoun one do one s best perjure oneself In European languages with grammatical gender the citation form of regular adjectives and nouns is usually the masculine singular citation needed If the language also has cases the citation form is often the masculine singular nominative For many languages the citation form of a verb is the infinitive French aller German gehen Hindustani ज न جانا Spanish ir English verbs usually have an infinitive which in its bare form without the particle to is its least marked for example break is chosen over to break breaks broke breaking and broken for defective verbs with no infinitive the present tense is used for example must has only one form while shall has no infinitive and both lemmas are their lexemes present tense forms For Latin Ancient Greek Modern Greek and Bulgarian the first person singular present tense is traditionally used but some modern dictionaries use the infinitive instead except for Bulgarian which lacks infinitives for contracted verbs in Ancient Greek an uncontracted first person singular present tense is used to reveal the contract vowel filew phileō for filῶ philō I love implying affection ἀgapaw agapaō for ἀgapῶ agapō I love implying regard Finnish dictionaries list verbs not under their root but under the first infinitive marked with t a t a For Japanese the non past present and future tense is used For Arabic the third person singular masculine of the past perfect tense is the least marked form and is used for entries in modern dictionaries In older dictionaries which are still commonly used the triliteral of the word either a verb or a noun is used This is similar to Hebrew which also uses the third person singular masculine perfect form e g ברא bara create כפר kaphar deny Georgian uses the verbal noun For Korean da is attached to the stem In Tamil an agglutinative language the verb stem which is also the imperative form the least marked one is often cited e g இர In Irish words are highly inflected by case genitive nominative dative and vocative and by their place within a sentence because of initial mutations The noun cainteoir the lemma for the noun meaning speaker has a variety of forms chainteoir gcainteoir cainteora chainteora cainteoiri chainteoiri and gcainteoiri Some phrases are cited in a sort of lemma Carthago delenda est literally Carthage must be destroyed is a common way of citing Cato but what he said was nearer to censeo Carthaginem esse delendam I hold Carthage to be in need of destruction Lexicography editIn a dictionary the lemma go represents the inflected forms go goes going went and gone The relationship between an inflected form and its lemma is usually denoted by an angle bracket e g went lt go Of course the disadvantage of such simplifications is the inability to look up a declined or conjugated form of the word but some dictionaries like Webster s Dictionary list went Multilingual dictionaries vary in how they deal with this issue the Langenscheidt dictionary of German does not list ging lt gehen but the Cassell does Lemmas or word stems are used often in corpus linguistics for determining word frequency In that usage the specific definition of lemma is flexible depending on the task it is being used for Pronunciation editA word may have different pronunciations depending on its phonetic environment the neighbouring sounds or on the degree of stress in a sentence An example of the latter is the weak and strong forms of certain English function words like some and but pronounced sʌm bʌt when stressed but s e m bet when unstressed Dictionaries usually give the pronunciation used when the word is pronounced alone its isolation form and with stress but they may also note common weak forms of pronunciation Difference between stem and lemma editThe stem is the part of the word that never changes even when morphologically inflected a lemma is the least marked form of the word For example from produced the lemma is produce but the stem is produc This is because there are words such as production and producing 3 failed verification In linguistic analysis the stem is defined more generally as the analyzed base form from which all inflected forms can be formed citation needed When phonology is taken into account the definition of the unchangeable part of the word is not useful as can be seen in the phonological forms of the words in the preceding example produced p r e ˈ dj uː s t vs production p r e ˈ d ʌ k ʃ en Some lexemes have several stems but one lemma For instance the verb to go has the stems go and went due to suppletion the past tense was co opted from a different verb to wend Headword editA headword lemma or catchword 4 is the word under which a set of related dictionary or encyclopaedia entries appears The headword is used to locate the entry and dictates its alphabetical position Depending on the size and nature of the dictionary or encyclopedia the entry may include alternative meanings of the word its etymology pronunciation and inflections compound words or phrases that contain the headword and encyclopedic information about the concepts represented by the word For example the headword bread may contain the following simplified definitions Bread noun A common food made from the combination of flour water and yeast Money slang verb To coat in breadcrumbs to know which side your bread is buttered to know how to act in your own best interests The Academic Dictionary of Lithuanian contains around 500 000 headwords The Oxford English Dictionary OED has around 273 000 headwords along with 220 000 lemmas 5 while Webster s Third New International Dictionary has about 470 000 6 The Deutsches Worterbuch DWB the largest lexicon of the German language has around 330 000 headwords 7 These values are cited by the dictionary makers and may not use exactly the same definition of a headword In addition headwords may not accurately reflect a dictionary s physical size The OED and the DWB for instance include exhaustive historical reviews and exact citations from source documents not usually found in standard dictionaries The term lemma comes from the practice in Greco Roman antiquity of using the word to refer to the headwords of marginal glosses in scholia for this reason the Ancient Greek plural form is sometimes used namely lemmata Greek lῆmma pl lhmmata See also editLexeme Lexical Markup Framework Null morpheme Principal parts Root linguistics Uninflected wordReferences edit Zgusta Ladislav 2006 Dolezal Fredric F M ed Lexicography then and now p 202 ISBN 3484391294 A minor problem can arise when the canonical form of the headword i e the form in which it is to be cited is to be chosen Francis W N Kucera H 1982 Frequency Analysis of English Usage Lexicon and Usage Boston Houghton Mifflin Natural Language Toolkit NLTK 3 0 documentation Nltk org 2015 09 05 Retrieved 2015 09 27 Oxford English Dictionary 3rd edition 2018 s v definition 5 Glossary Oxford English Dictionary public oed com Retrieved 3 October 2016 Mwunabridged www merriam webster com Retrieved 3 October 2016 The Deutsches Worterbuch Archived 2016 08 12 at the Wayback Machine at the BBAW retrieved 22 June 2012 External links edit nbsp Look up Wiktionary Lemmas in Wiktionary the free dictionary Retrieved from https en wikipedia org w index php title Lemma morphology amp oldid 1174928541, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.