fbpx
Wikipedia

IETF language tag

An IETF BCP 47 language tag is a standardized code or tag that is used to identify human languages in the Internet. The tag structure has been standardized by the Internet Engineering Task Force (IETF) in Best Current Practice (BCP) 47; the subtags are maintained by the IANA Language Subtag Registry.[1][2][3]

To distinguish language variants for countries, regions, or writing systems (scripts), IETF language tags combine subtags from other standards such as ISO 639, ISO 15924, ISO 3166-1 and UN M.49. For example, the tag "en" stands for English; "es-419" for Latin American Spanish; "rm-sursilv" for Romansh Sursilvan; "sr-Cyrl" for Serbian written in Cyrillic script; "nan-Hant-TW" for Min Nan Chinese using traditional Han characters, as spoken in Taiwan; and "gsw-u-sd-chzh" for Zürich German. In its accordance with ISO 639-3, however, it does not provide codes for distinguishing between Arabic-based scripts, and maintains two duplicate codes for Punjabi, as well as a number of dubious or non-existent language distinctions made by its parents standard.[4]

It is used by computing standards such as HTTP,[5] HTML,[6] XML[7] and PNG.[8]

History

IETF language tags were first defined in RFC 1766, edited by Harald Tveit Alvestrand, published in March 1995. The tags used ISO 639 two-letter language codes and ISO 3166 two-letter country codes, and allowed registration of whole tags that included variant or script subtags of three to eight letters.

In January 2001, this was updated by RFC 3066, which added the use of ISO 639-2 three-letter codes, permitted subtags with digits, and adopted the concept of language ranges from HTTP/1.1 to help with matching of language tags.

The next revision of the specification came in September 2006 with the publication of RFC 4646 (the main part of the specification), edited by Addison Philips and Mark Davis and RFC 4647 (which deals with matching behaviour). RFC 4646 introduced a more structured format for language tags, added the use of ISO 15924 four-letter script codes and UN M.49 three-digit geographical region codes, and replaced the old registry of tags with a new registry of subtags. The small number of previously defined tags that did not conform to the new structure were grandfathered in order to maintain compatibility with RFC 3066.

The current version of the specification, RFC 5646, was published in September 2009. The main purpose of this revision was to incorporate three-letter codes from ISO 639-3 and 639-5 into the Language Subtag Registry, in order to increase the interoperability between ISO 639 and BCP 47.[9]

Syntax of language tags

Each language tag is composed of one or more "subtags" separated by hyphens (-). Each subtag is composed of basic Latin letters or digits only.

With the exceptions of private-use language tags beginning with an x- prefix and grandfathered language tags (including those starting with an i- prefix and those previously registered in the old Language Tag Registry), subtags occur in the following order:

  • A single primary language subtag based on a two-letter language code from ISO 639-1 (2002) or a three-letter code from ISO 639-2 (1998), ISO 639-3 (2007) or ISO 639-5 (2008), or registered through the BCP 47 process and composed of five to eight letters;
  • Up to three optional extended language subtags composed of three letters each, separated by hyphens; (There is currently no extended language subtag registered in the Language Subtag Registry without an equivalent and preferred primary language subtag. This component of language tags is preserved for backwards compatibility and to allow for future parts of ISO 639.)
  • An optional script subtag, based on a four-letter script code from ISO 15924 (usually written in Title Case);
  • An optional region subtag based on a two-letter country code from ISO 3166-1 alpha-2 (usually written in upper case), or a three-digit code from UN M.49 for geographical regions;
  • Optional variant subtags, separated by hyphens, each composed of five to eight letters, or of four characters starting with a digit; (Variant subtags are registered with IANA and not associated with any external standard.)
  • Optional extension subtags, separated by hyphens, each composed of a single character, with the exception of the letter x, and a hyphen followed by one or more subtags of two to eight characters each, separated by hyphens;
  • An optional private-use subtag, composed of the letter x and a hyphen followed by subtags of one to eight characters each, separated by hyphens.

Subtags are not case-sensitive, but the specification recommends using the same case as in the Language Subtag Registry, where region subtags are UPPERCASE, script subtags are Title Case, and all other subtags are lowercase. This capitalization follows the recommendations of the underlying ISO standards.

Optional script and region subtags are preferred to be omitted when they add no distinguishing information to a language tag. For example, es is preferred over es-Latn, as Spanish is fully expected to be written in the Latin script; ja is preferred over ja-JP, as Japanese as used in Japan does not differ markedly from Japanese as used elsewhere.

Not all linguistic regions can be represented with a valid region subtag: the subnational regional dialects of a primary language are registered as variant subtags. For example, the valencia variant subtag for the Valencian variant of the Catalan is registered in the Language Subtag Registry with the prefix ca. As this dialect is spoken almost exclusively in Spain, the region subtag ES can normally be omitted.

Furthermore, there are script tags that do not refer to traditional scripts such as Latin, or even scripts at all, and these usually begin with a Z. For example, Zsye refers to emojis, Zmth to mathematical notation, Zxxx to unwritten documents and Zyyy to undetermined scripts.

IETF language tags have been used as locale identifiers in many applications. It may be necessary for these applications to establish their own strategy for defining, encoding and matching locales if the strategy described in RFC 4647 is not adequate.

The use, interpretation and matching of IETF language tags is currently defined in RFC 5646 and RFC 4647. The Language Subtag Registry lists all currently valid public subtags. Private-use subtags are not included in the Registry as they are implementation-dependent and subject to private agreements between third parties using them. These private agreements are out of scope of BCP 47.

List of common primary language subtags

The following is a list of some of the more commonly used primary language subtags. The list represents only a small subset (less than 2 percent) of primary language subtags; for full information, the Language Subtag Registry should be consulted directly.

Common languages and their IETF subtags[10]
English name Native name Subtag
Afrikaans Afrikaans af
Amharic አማርኛ am
Arabic العربية ar
Mapudungun Mapudungun arn
Assamese অসমীয়া as
Azerbaijani Azərbaycan­lı az
Bashkir Башҡорт ba
Belarusian беларуская be
Bulgarian български bg
Bengali বাংলা bn
Tibetan བོད་ཡིག bo
Breton brezhoneg br
Bosnian bosanski/босански bs
Catalan català ca
Corsican Corsu co
Czech čeština cs
Welsh Cymraeg cy
Danish dansk da
German Deutsch de
Lower Sorbian dolnoserbšćina dsb
Divehi ދިވެހިބަސް dv
Greek ελληνικά el
English English en
Spanish español es
Estonian eesti et
Basque euskara eu
Persian فارسى fa
Finnish suomi fi
Filipino Filipino fil
Faroese føroyskt fo
French français fr
Frisian Frysk fy
Irish Gaeilge ga
Scottish Gaelic Gàidhlig gd
Galician galego gl
Alsatian Elsässisch gsw
Gujarati ગુજરાતી gu
Hausa Hausa ha
Hebrew עברית he
Hindi हिंदी hi
Croatian hrvatski hr
Upper Sorbian hornjoserbšćina hsb
Hungarian magyar hu
Armenian Հայերեն hy
Indonesian Bahasa Indonesia id
Igbo Igbo ig
Yi ꆈꌠꁱꂷ ii
Icelandic íslenska is
Italian italiano it
Inuktitut Inuktitut /ᐃᓄᒃᑎᑐᑦ (ᑲᓇᑕ) iu
Japanese 日本語 ja
Georgian ქართული ka
Kazakh Қазақша kk
Greenlandic kalaallisut kl
Khmer ខ្មែរ km
Kannada ಕನ್ನಡ kn
Korean 한국어/韓國語
조선말/朝鮮말
ko
Konkani कोंकणी kok
Kyrgyz Кыргыз ky
Luxembourgish Lëtzebuergesch lb
Lao ລາວ lo
Lithuanian lietuvių lt
Latvian latviešu lv
Maori Reo Māori mi
Macedonian македонски јазик mk
Malayalam മലയാളം ml
Mongolian Монгол хэл/ᠮᠤᠨᠭᠭᠤᠯ ᠬᠡᠯᠡ mn
Mohawk Kanien'kéha moh
Marathi मराठी mr
Malay Bahasa Malaysia ms
Maltese Malti mt
Burmese Myanmar my
Norwegian (Bokmål) norsk (bokmål) nb
Nepali नेपाली (नेपाल) ne
Dutch Nederlands nl
Norwegian (Nynorsk) norsk (nynorsk) nn
Norwegian norsk no
Sesotho Sesotho sa Leboa st
Occitan Occitan oc
Odia ଓଡ଼ିଆ or
Punjabi ਪੰਜਾਬੀ pa
Polish polski pl
Dari درى prs
Pashto پښتو ps
Portuguese Português pt
K'iche K'iche quc
Quechua runasimi qu
Romansh Rumantsch rm
Romanian română ro
Russian русский ru
Kinyarwanda Kinyarwanda rw
Sanskrit संस्कृत sa
Yakut саха sah
Sami (Northern) davvisámegiella se
Sinhala සිංහල si
Slovak slovenčina sk
Slovenian slovenski sl
Sami (Southern) åarjelsaemiengiele sma
Sami (Lule) julevusámegiella smj
Sami (Inari) sämikielâ smn
Sami (Skolt) sääm´ǩiõll sms
Albanian shqipe sq
Serbian srpski/српски sr
Swedish svenska sv
Kiswahili Kiswahili sw
Syriac ܣܘܪܝܝܐ syc
Tamil தமிழ் ta
Telugu తెలుగు te
Tajik Тоҷикӣ tg
Thai ไทย th
Turkmen türkmençe tk
Tswana Setswana tn
Turkish Türkçe tr
Tatar Татарча tt
Tamazight Tamazight tzm
Uyghur ئۇيغۇرچە ug
Ukrainian українська uk
Urdu اُردو ur
Uzbek U'zbek/Ўзбек uz
Vietnamese Tiếng Việt/㗂越 vi
Wolof Wolof wo
Xhosa isiXhosa xh
Yoruba Yoruba yo
Chinese 中文 zh
Zulu isiZulu zu

Relation to other standards

Although some types of subtags are derived from ISO or UN core standards, they do not follow these standards absolutely, as this could lead to the meaning of language tags changing over time. In particular, a subtag derived from a code assigned by ISO 639, ISO 15924, ISO 3166, or UN M.49 remains a valid (though deprecated) subtag even if the code is withdrawn from the corresponding core standard. If the standard later assigns a new meaning to the withdrawn code, the corresponding subtag will still retain its old meaning.

This stability was introduced in RFC 4646.

ISO 639-3 and ISO 639-1

RFC 4646 defined the concept of an "extended language subtag" (sometimes referred to as extlang), although no such subtags were registered at that time.[11][failed verification][12][failed verification]

RFC 5645 and RFC 5646 added primary language subtags corresponding to ISO 639-3 codes for all languages that did not already exist in the Registry. In addition, codes for languages encompassed by certain macrolanguages were registered as extended language subtags. Sign languages were also registered as extlangs, with the prefix sgn. These languages may be represented either with the subtag for the encompassed language alone (cmn for Mandarin) or with a language-extlang combination (zh-cmn). The first option is preferred for most purposes. The second option is called "extlang form" and is new in RFC 5646.

Whole tags that were registered prior to RFC 4646 and are now classified as "grandfathered" or "redundant" (depending on whether they fit the new syntax) are deprecated in favor of the corresponding ISO 639-3–based language subtag, if one exists. To list a few examples, nan is preferred over zh-min-nan for Min Nan Chinese; hak is preferred over i-hak and zh-hakka for Hakka Chinese; and ase is preferred over sgn-US for American Sign Language.

ISO 639-5 and ISO 639-2

ISO 639-5 defines language collections with alpha-3 codes in a different way than they were initially encoded in ISO 639-2 (including one code already present in ISO 639-1). Specifically, the language collections are now all defined in ISO 639-5 as inclusive, rather than some of them being defined exclusively. This means that language collections have a broader scope than before, in some cases where they could encompass languages that were already encoded separately within ISO 639-2.

For example, the ISO 639-2 code afa was previously associated with the name "Afro-Asiatic (Other)", excluding languages such as Arabic that already had their own code. In ISO 639-5, this collection is named "Afro-Asiatic languages" and includes all such languages. ISO 639-2 changed the exclusive names in 2009 to match the inclusive ISO 639-5 names.[13]

To avoid breaking implementations that may still depend on the older (exclusive) definition of these collections, ISO 639-5 defines a grouping type attribute for all collections that were already encoded in ISO 639-2 (such grouping type is not defined for the new collections added only in ISO 639-5).

BCP 47 defines a "Scope" property to identify subtags for language collections. However, it does not define any given collection as inclusive or exclusive, and does not use the ISO 639-5 grouping type attribute, although the description fields in the Language Subtag Registry for these subtags match the ISO 639-5 (inclusive) names. As a consequence, BCP 47 language tags that include a primary language subtag for a collection may be ambiguous as to whether the collection is intended to be inclusive or exclusive.

ISO 639-5 does not define precisely which languages are members of these collections; only the hierarchical classification of collections is defined, using the inclusive definition of these collections. Because of this, RFC 5646 does not recommend the use of subtags for language collections for most applications, although they are still preferred over subtags whose meaning is even less specific, such as "Multiple languages" and "Undetermined".

In contrast, the classification of individual languages within their macrolanguage is standardized, in both ISO 639-3 and the Language Subtag Registry.

ISO 15924, ISO/IEC 10646 and Unicode

Script subtags were first added to the Language Subtag Registry when RFC 4646 was published, from the list of codes defined in ISO 15924. They are encoded in the language tag after primary and extended language subtags, but before other types of subtag, including region and variant subtags.

Some primary language subtags are defined with a property named "Suppress-Script" which indicates the cases where a single script can usually be assumed by default for the language, even if it can be written with another script. When this is the case, it is preferable to omit the script subtag, to improve the likelihood of successful matching. A different script subtag can still be appended to make the distinction when necessary. For example, yi is preferred over yi-Hebr in most contexts, because the Hebrew script subtag is assumed for the Yiddish language.

As another example, zh-Hans-SG may be considered equivalent to zh-Hans, because the region code is probably not significant; the written form of Chinese used in Singapore uses the same simplified Chinese characters as in other countries where Chinese is written. However, the script subtag is maintained because it is significant.

Note that ISO 15924 includes some codes for script variants (for example, Hans and Hant for simplified and traditional forms of Chinese characters) that are unified within Unicode and ISO/IEC 10646. These script variants are most often encoded for bibliographic purposes, but are not always significant from a linguistic point of view (for example, Latf and Latg script codes for the Fraktur and Gaelic variants of the Latin script, which are mostly encoded with regular Latin letters in Unicode and ISO/IEC 10646). They may occasionally be useful in language tags to expose orthographic or semantic differences, with different analysis of letters, diacritics, and digraphs/trigraphs as default grapheme clusters, or differences in letter casing rules.

ISO 3166-1 and UN M.49

Two-letter region subtags are based on codes assigned, or "exceptionally reserved", in ISO 3166-1. If the ISO 3166 Maintenance Agency were to reassign a code that had previously been assigned to a different country, the existing BCP 47 subtag corresponding to that code would retain its meaning, and a new region subtag based on UN M.49 would be registered for the new country. UN M.49 is also the source for numeric region subtags for geographical regions, such as 005 for South America. The UN M.49 codes for economic regions are not allowed.

Region subtags are used to specify the variety of a language "as used in" a particular region. They are appropriate when the variety is regional in nature, and can be captured adequately by identifying the countries involved, as when distinguishing British English (en-GB) from American English (en-US). When the difference is one of script or script variety, as for simplified versus traditional Chinese characters, it should be expressed with a script subtag instead of a region subtag; in this example, zh-Hans and zh-Hant should be used instead of zh-CN and zh-HK.

When a distinct language subtag exists for a language that could be considered a regional variety, it is often preferable to use the more specific subtag instead of a language-region combination. For example, ar-DZ (Arabic as used in Algeria) may be better expressed as arq for Algerian Spoken Arabic.

Extensions

Extension subtags (not to be confused with extended language subtags) allow additional information to be attached to a language tag that does not necessarily serve to identify a language. One use for extensions is to encode locale information, such as calendar and currency.

Extension subtags are composed of multiple hyphen-separated character strings, starting with a single character (other than x), called a singleton. Each extension is described in its own IETF RFC, which identifies a Registration Authority to manage the data for that extension. IANA is responsible for allocating singletons.

Two extensions have been assigned as of January 2014.

Extension T (Transformed Content)

Extension T allows a language tag to include information on how the tagged data was transliterated, transcribed, or otherwise transformed. For example, the tag en-t-jp could be used for content in English that was translated from the original Japanese. Additional substrings could indicate that the translation was done mechanically, or in accordance with a published standard.

Extension T is described in the informational RFC 6497, published in February 2012.[14] The Registration Authority is the Unicode Consortium.

Extension U (Unicode Locale)

Extension U allows a wide variety of locale attributes found in the Common Locale Data Repository (CLDR) to be embedded in language tags. These attributes include country subdivisions, calendar and time zone data, collation order, currency, number system, and keyboard identification.

Some examples include:

Extension U is described in the informational RFC 6067, published in December 2010.[15] The Registration Authority is the Unicode Consortium.

See also

References

  1. ^ "Language Subtag Registry". iana.org. Internet Assigned Numbers Authority. Retrieved 2018-12-05.
  2. ^ "Language Tag Extensions Registry". iana.org. Internet Assigned Numbers Authority. Retrieved 2018-12-06.
  3. ^ "IANA — Protocol Registries". iana.org. Retrieved 28 July 2015.
  4. ^ Ewell, Doug (2022-08-12). "Re: [Ietf-languages] Punjabi language code fix recommendations". Retrieved 2022-08-12.
  5. ^ Fielding, Roy T.; Reschke, Julian F., eds. (June 2014). "Language Tags". Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content. sec. 3.1.3.1. doi:10.17487/RFC7231. RFC 7231.
  6. ^ "Language information and text direction". w3.org. Retrieved 28 July 2015.
  7. ^ "Extensible Markup Language (XML) 1.0 (Fifth Edition)". w3.org. Retrieved 28 July 2015.
  8. ^ "Portable Network Graphics (PNG) Specification (Second Edition)". w3.org. Retrieved 28 July 2015.
  9. ^ Language Tag Registry Update charter 2007-02-10 at the Wayback Machine
  10. ^ "Letter Codes of Cultures - List". Retrieved 2022-01-08.{{cite web}}: CS1 maint: url-status (link)
  11. ^ Addison Phillips, Mark Davis (2008). "Tags for Identifying Languages (old draft for the revision of RFC 4646, now obsolete and may disappear soon)". IETF WG LTRU. Retrieved 2008-06-23.
  12. ^ Doug Ewell (2008). "Update to the Language Subtag Registry (old draft for the revision of RFC 4645, now obsolete and may disappear soon)" (1MB). IETF WG LTRU. Retrieved 2008-06-23.
  13. ^ "ISO 639-2 Language Code List - Codes for the representation of names of languages (Library of Congress)". loc.gov. Retrieved 28 July 2015.
  14. ^ Davis, M.; Phillips, A.; Umaoka, Y.; Falk, C. (February 2012). "BCP 47 Extension T - Transformed Content". rfc-editor.org. RFC Editor (informational). doi:10.17487/RFC6497. RFC 6497. Retrieved 24 June 2022.
  15. ^ Davis, M.; Phillips, A.; Umaoka, Y. (December 2010). "BCP 47 Extension U". rfc-editor.org. RFC Editor (informational). doi:10.17487/RFC6067. RFC 6067. Retrieved 24 June 2022.

External links

  • BCP 47 Language Tags – current specification (contains two RFCs, RFC 5646 and RFC 4647 published separately at different dates, but concatenated in a single document)
    • (also referencing the related informational RFC 5645, which complements the previous informational RFC 4645, as well other individual registration forms published separately by others for each language added or modified in the Registry between these BCP 47 revisions)
  • Language Subtag Registry – maintained by IANA
  • Language Subtag Registry Search – find subtags and view entries in the Registry
  • Language tags in HTML and XML – from the W3C
  • Language Tags 2017-10-19 at the Wayback Machine – from the IETF Language Tag Registry Update working group

ietf, language, this, article, external, links, follow, wikipedia, policies, guidelines, please, improve, this, article, removing, excessive, inappropriate, external, links, converting, useful, links, where, appropriate, into, footnote, references, august, 202. This article s use of external links may not follow Wikipedia s policies or guidelines Please improve this article by removing excessive or inappropriate external links and converting useful links where appropriate into footnote references August 2020 Learn how and when to remove this template message An IETF BCP 47 language tag is a standardized code or tag that is used to identify human languages in the Internet The tag structure has been standardized by the Internet Engineering Task Force IETF in Best Current Practice BCP 47 the subtags are maintained by the IANA Language Subtag Registry 1 2 3 To distinguish language variants for countries regions or writing systems scripts IETF language tags combine subtags from other standards such as ISO 639 ISO 15924 ISO 3166 1 and UN M 49 For example the tag en stands for English es 419 for Latin American Spanish rm sursilv for Romansh Sursilvan sr Cyrl for Serbian written in Cyrillic script nan Hant TW for Min Nan Chinese using traditional Han characters as spoken in Taiwan and gsw u sd chzh for Zurich German In its accordance with ISO 639 3 however it does not provide codes for distinguishing between Arabic based scripts and maintains two duplicate codes for Punjabi as well as a number of dubious or non existent language distinctions made by its parents standard 4 It is used by computing standards such as HTTP 5 HTML 6 XML 7 and PNG 8 Contents 1 History 2 Syntax of language tags 3 List of common primary language subtags 4 Relation to other standards 4 1 ISO 639 3 and ISO 639 1 4 2 ISO 639 5 and ISO 639 2 4 3 ISO 15924 ISO IEC 10646 and Unicode 4 4 ISO 3166 1 and UN M 49 5 Extensions 5 1 Extension T Transformed Content 5 2 Extension U Unicode Locale 6 See also 7 References 8 External linksHistory EditIETF language tags were first defined in RFC 1766 edited by Harald Tveit Alvestrand published in March 1995 The tags used ISO 639 two letter language codes and ISO 3166 two letter country codes and allowed registration of whole tags that included variant or script subtags of three to eight letters In January 2001 this was updated by RFC 3066 which added the use of ISO 639 2 three letter codes permitted subtags with digits and adopted the concept of language ranges from HTTP 1 1 to help with matching of language tags The next revision of the specification came in September 2006 with the publication of RFC 4646 the main part of the specification edited by Addison Philips and Mark Davis and RFC 4647 which deals with matching behaviour RFC 4646 introduced a more structured format for language tags added the use of ISO 15924 four letter script codes and UN M 49 three digit geographical region codes and replaced the old registry of tags with a new registry of subtags The small number of previously defined tags that did not conform to the new structure were grandfathered in order to maintain compatibility with RFC 3066 The current version of the specification RFC 5646 was published in September 2009 The main purpose of this revision was to incorporate three letter codes from ISO 639 3 and 639 5 into the Language Subtag Registry in order to increase the interoperability between ISO 639 and BCP 47 9 Syntax of language tags EditEach language tag is composed of one or more subtags separated by hyphens Each subtag is composed of basic Latin letters or digits only With the exceptions of private use language tags beginning with an x prefix and grandfathered language tags including those starting with an i prefix and those previously registered in the old Language Tag Registry subtags occur in the following order A single primary language subtag based on a two letter language code from ISO 639 1 2002 or a three letter code from ISO 639 2 1998 ISO 639 3 2007 or ISO 639 5 2008 or registered through the BCP 47 process and composed of five to eight letters Up to three optional extended language subtags composed of three letters each separated by hyphens There is currently no extended language subtag registered in the Language Subtag Registry without an equivalent and preferred primary language subtag This component of language tags is preserved for backwards compatibility and to allow for future parts of ISO 639 An optional script subtag based on a four letter script code from ISO 15924 usually written in Title Case An optional region subtag based on a two letter country code from ISO 3166 1 alpha 2 usually written in upper case or a three digit code from UN M 49 for geographical regions Optional variant subtags separated by hyphens each composed of five to eight letters or of four characters starting with a digit Variant subtags are registered with IANA and not associated with any external standard Optional extension subtags separated by hyphens each composed of a single character with the exception of the letter x and a hyphen followed by one or more subtags of two to eight characters each separated by hyphens An optional private use subtag composed of the letter x and a hyphen followed by subtags of one to eight characters each separated by hyphens Subtags are not case sensitive but the specification recommends using the same case as in the Language Subtag Registry where region subtags are UPPERCASE script subtags are Title Case and all other subtags are lowercase This capitalization follows the recommendations of the underlying ISO standards Optional script and region subtags are preferred to be omitted when they add no distinguishing information to a language tag For example es is preferred over es Latn as Spanish is fully expected to be written in the Latin script ja is preferred over ja JP as Japanese as used in Japan does not differ markedly from Japanese as used elsewhere Not all linguistic regions can be represented with a valid region subtag the subnational regional dialects of a primary language are registered as variant subtags For example the valencia variant subtag for the Valencian variant of the Catalan is registered in the Language Subtag Registry with the prefix ca As this dialect is spoken almost exclusively in Spain the region subtag ES can normally be omitted Furthermore there are script tags that do not refer to traditional scripts such as Latin or even scripts at all and these usually begin with a Z For example Zsye refers to emojis Zmth to mathematical notation Zxxx to unwritten documents and Zyyy to undetermined scripts IETF language tags have been used as locale identifiers in many applications It may be necessary for these applications to establish their own strategy for defining encoding and matching locales if the strategy described in RFC 4647 is not adequate The use interpretation and matching of IETF language tags is currently defined in RFC 5646 and RFC 4647 The Language Subtag Registry lists all currently valid public subtags Private use subtags are not included in the Registry as they are implementation dependent and subject to private agreements between third parties using them These private agreements are out of scope of BCP 47 List of common primary language subtags EditThe following is a list of some of the more commonly used primary language subtags The list represents only a small subset less than 2 percent of primary language subtags for full information the Language Subtag Registry should be consulted directly Common languages and their IETF subtags 10 English name Native name SubtagAfrikaans Afrikaans afAmharic አማርኛ amArabic العربية arMapudungun Mapudungun arnAssamese অসম য asAzerbaijani Azerbaycan li azBashkir Bashҡort baBelarusian belaruskaya beBulgarian blgarski bgBengali ব ল bnTibetan བ ད ཡ ག boBreton brezhoneg brBosnian bosanski bosanski bsCatalan catala caCorsican Corsu coCzech cestina csWelsh Cymraeg cyDanish dansk daGerman Deutsch deLower Sorbian dolnoserbscina dsbDivehi ދ ވ ހ ބ ސ dvGreek ellhnika elEnglish English enSpanish espanol esEstonian eesti etBasque euskara euPersian فارسى faFinnish suomi fiFilipino Filipino filFaroese foroyskt foFrench francais frFrisian Frysk fyIrish Gaeilge gaScottish Gaelic Gaidhlig gdGalician galego glAlsatian Elsassisch gswGujarati ગ જર ત guHausa Hausa haHebrew עברית heHindi ह द hiCroatian hrvatski hrUpper Sorbian hornjoserbscina hsbHungarian magyar huArmenian Հայերեն hyIndonesian Bahasa Indonesia idIgbo Igbo igYi ꆈꌠꁱꂷ iiIcelandic islenska isItalian italiano itInuktitut Inuktitut ᐃᓄᒃᑎᑐᑦ ᑲᓇᑕ iuJapanese 日本語 jaGeorgian ქართული kaKazakh Қazaksha kkGreenlandic kalaallisut klKhmer ខ ម រ kmKannada ಕನ ನಡ knKorean 한국어 韓國語조선말 朝鮮말 koKonkani क कण kokKyrgyz Kyrgyz kyLuxembourgish Letzebuergesch lbLao ລາວ loLithuanian lietuviu ltLatvian latviesu lvMaori Reo Maori miMacedonian makedonski јazik mkMalayalam മലയ ള mlMongolian Mongol hel ᠮᠤᠨᠭᠭᠤᠯ ᠬᠡᠯᠡ mnMohawk Kanien keha mohMarathi मर ठ mrMalay Bahasa Malaysia msMaltese Malti mtBurmese Myanmar myNorwegian Bokmal norsk bokmal nbNepali न प ल न प ल neDutch Nederlands nlNorwegian Nynorsk norsk nynorsk nnNorwegian norsk noSesotho Sesotho sa Leboa stOccitan Occitan ocOdia ଓଡ ଆ orPunjabi ਪ ਜ ਬ paPolish polski plDari درى prsPashto پښتو psPortuguese Portugues ptK iche K iche qucQuechua runasimi quRomansh Rumantsch rmRomanian romană roRussian russkij ruKinyarwanda Kinyarwanda rwSanskrit स स क त saYakut saha sahSami Northern davvisamegiella seSinhala ස හල siSlovak slovencina skSlovenian slovenski slSami Southern aarjelsaemiengiele smaSami Lule julevusamegiella smjSami Inari samikiela smnSami Skolt saam ǩioll smsAlbanian shqipe sqSerbian srpski srpski srSwedish svenska svKiswahili Kiswahili swSyriac ܣܘܪܝܝܐ sycTamil தம ழ taTelugu త ల గ teTajik Toҷikӣ tgThai ithy thTurkmen turkmence tkTswana Setswana tnTurkish Turkce trTatar Tatarcha ttTamazight Tamazight tzmUyghur ئۇيغۇرچە ugUkrainian ukrayinska ukUrdu ا ردو urUzbek U zbek Ўzbek uzVietnamese Tiếng Việt 㗂越 viWolof Wolof woXhosa isiXhosa xhYoruba Yoruba yoChinese 中文 zhZulu isiZulu zuRelation to other standards EditAlthough some types of subtags are derived from ISO or UN core standards they do not follow these standards absolutely as this could lead to the meaning of language tags changing over time In particular a subtag derived from a code assigned by ISO 639 ISO 15924 ISO 3166 or UN M 49 remains a valid though deprecated subtag even if the code is withdrawn from the corresponding core standard If the standard later assigns a new meaning to the withdrawn code the corresponding subtag will still retain its old meaning This stability was introduced in RFC 4646 ISO 639 3 and ISO 639 1 Edit RFC 4646 defined the concept of an extended language subtag sometimes referred to as extlang although no such subtags were registered at that time 11 failed verification 12 failed verification RFC 5645 and RFC 5646 added primary language subtags corresponding to ISO 639 3 codes for all languages that did not already exist in the Registry In addition codes for languages encompassed by certain macrolanguages were registered as extended language subtags Sign languages were also registered as extlangs with the prefix sgn These languages may be represented either with the subtag for the encompassed language alone cmn for Mandarin or with a language extlang combination zh cmn The first option is preferred for most purposes The second option is called extlang form and is new in RFC 5646 Whole tags that were registered prior to RFC 4646 and are now classified as grandfathered or redundant depending on whether they fit the new syntax are deprecated in favor of the corresponding ISO 639 3 based language subtag if one exists To list a few examples nan is preferred over zh min nan for Min Nan Chinese hak is preferred over i hak and zh hakka for Hakka Chinese and ase is preferred over sgn US for American Sign Language ISO 639 5 and ISO 639 2 Edit ISO 639 5 defines language collections with alpha 3 codes in a different way than they were initially encoded in ISO 639 2 including one code already present in ISO 639 1 Specifically the language collections are now all defined in ISO 639 5 as inclusive rather than some of them being defined exclusively This means that language collections have a broader scope than before in some cases where they could encompass languages that were already encoded separately within ISO 639 2 For example the ISO 639 2 code afa was previously associated with the name Afro Asiatic Other excluding languages such as Arabic that already had their own code In ISO 639 5 this collection is named Afro Asiatic languages and includes all such languages ISO 639 2 changed the exclusive names in 2009 to match the inclusive ISO 639 5 names 13 To avoid breaking implementations that may still depend on the older exclusive definition of these collections ISO 639 5 defines a grouping type attribute for all collections that were already encoded in ISO 639 2 such grouping type is not defined for the new collections added only in ISO 639 5 BCP 47 defines a Scope property to identify subtags for language collections However it does not define any given collection as inclusive or exclusive and does not use the ISO 639 5 grouping type attribute although the description fields in the Language Subtag Registry for these subtags match the ISO 639 5 inclusive names As a consequence BCP 47 language tags that include a primary language subtag for a collection may be ambiguous as to whether the collection is intended to be inclusive or exclusive ISO 639 5 does not define precisely which languages are members of these collections only the hierarchical classification of collections is defined using the inclusive definition of these collections Because of this RFC 5646 does not recommend the use of subtags for language collections for most applications although they are still preferred over subtags whose meaning is even less specific such as Multiple languages and Undetermined In contrast the classification of individual languages within their macrolanguage is standardized in both ISO 639 3 and the Language Subtag Registry ISO 15924 ISO IEC 10646 and Unicode Edit Script subtags were first added to the Language Subtag Registry when RFC 4646 was published from the list of codes defined in ISO 15924 They are encoded in the language tag after primary and extended language subtags but before other types of subtag including region and variant subtags Some primary language subtags are defined with a property named Suppress Script which indicates the cases where a single script can usually be assumed by default for the language even if it can be written with another script When this is the case it is preferable to omit the script subtag to improve the likelihood of successful matching A different script subtag can still be appended to make the distinction when necessary For example yi is preferred over yi Hebr in most contexts because the Hebrew script subtag is assumed for the Yiddish language As another example zh Hans SG may be considered equivalent to zh Hans because the region code is probably not significant the written form of Chinese used in Singapore uses the same simplified Chinese characters as in other countries where Chinese is written However the script subtag is maintained because it is significant Note that ISO 15924 includes some codes for script variants for example Hans and Hant for simplified and traditional forms of Chinese characters that are unified within Unicode and ISO IEC 10646 These script variants are most often encoded for bibliographic purposes but are not always significant from a linguistic point of view for example Latf and Latg script codes for the Fraktur and Gaelic variants of the Latin script which are mostly encoded with regular Latin letters in Unicode and ISO IEC 10646 They may occasionally be useful in language tags to expose orthographic or semantic differences with different analysis of letters diacritics and digraphs trigraphs as default grapheme clusters or differences in letter casing rules ISO 3166 1 and UN M 49 Edit Further information Country code top level domain Historical ccTLDs Two letter region subtags are based on codes assigned or exceptionally reserved in ISO 3166 1 If the ISO 3166 Maintenance Agency were to reassign a code that had previously been assigned to a different country the existing BCP 47 subtag corresponding to that code would retain its meaning and a new region subtag based on UN M 49 would be registered for the new country UN M 49 is also the source for numeric region subtags for geographical regions such as 005 for South America The UN M 49 codes for economic regions are not allowed Region subtags are used to specify the variety of a language as used in a particular region They are appropriate when the variety is regional in nature and can be captured adequately by identifying the countries involved as when distinguishing British English en GB from American English en US When the difference is one of script or script variety as for simplified versus traditional Chinese characters it should be expressed with a script subtag instead of a region subtag in this example zh Hans and zh Hant should be used instead of zh CN and zh HK When a distinct language subtag exists for a language that could be considered a regional variety it is often preferable to use the more specific subtag instead of a language region combination For example ar DZ Arabic as used in Algeria may be better expressed as arq for Algerian Spoken Arabic Extensions EditExtension subtags not to be confused with extended language subtags allow additional information to be attached to a language tag that does not necessarily serve to identify a language One use for extensions is to encode locale information such as calendar and currency Extension subtags are composed of multiple hyphen separated character strings starting with a single character other than x called a singleton Each extension is described in its own IETF RFC which identifies a Registration Authority to manage the data for that extension IANA is responsible for allocating singletons Two extensions have been assigned as of January 2014 Extension T Transformed Content Edit Extension T allows a language tag to include information on how the tagged data was transliterated transcribed or otherwise transformed For example the tag en t jp could be used for content in English that was translated from the original Japanese Additional substrings could indicate that the translation was done mechanically or in accordance with a published standard Extension T is described in the informational RFC 6497 published in February 2012 14 The Registration Authority is the Unicode Consortium Extension U Unicode Locale Edit Extension U allows a wide variety of locale attributes found in the Common Locale Data Repository CLDR to be embedded in language tags These attributes include country subdivisions calendar and time zone data collation order currency number system and keyboard identification Some examples include gsw u sd chzh represents Swiss German as used in the Canton of Zurich ar u nu latn represents Arabic language content using Basic Latin digits 0 through 9 instead of Arabic script digits ٠ through ٩ he IL u ca hebrew tz jeruslm represents Hebrew as spoken in Israel using the traditional Hebrew calendar and in the Asia Jerusalem time zone as identified in the tz database Extension U is described in the informational RFC 6067 published in December 2010 15 The Registration Authority is the Unicode Consortium See also EditCodes for constructed languages Internationalization and localization Locale computer software References Edit Language Subtag Registry iana org Internet Assigned Numbers Authority Retrieved 2018 12 05 Language Tag Extensions Registry iana org Internet Assigned Numbers Authority Retrieved 2018 12 06 IANA Protocol Registries iana org Retrieved 28 July 2015 Ewell Doug 2022 08 12 Re Ietf languages Punjabi language code fix recommendations Retrieved 2022 08 12 Fielding Roy T Reschke Julian F eds June 2014 Language Tags Hypertext Transfer Protocol HTTP 1 1 Semantics and Content sec 3 1 3 1 doi 10 17487 RFC7231 RFC 7231 Language information and text direction w3 org Retrieved 28 July 2015 Extensible Markup Language XML 1 0 Fifth Edition w3 org Retrieved 28 July 2015 Portable Network Graphics PNG Specification Second Edition w3 org Retrieved 28 July 2015 Language Tag Registry Update charter Archived 2007 02 10 at the Wayback Machine Letter Codes of Cultures List Retrieved 2022 01 08 a href Template Cite web html title Template Cite web cite web a CS1 maint url status link Addison Phillips Mark Davis 2008 Tags for Identifying Languages old draft for the revision of RFC 4646 now obsolete and may disappear soon IETF WG LTRU Retrieved 2008 06 23 Doug Ewell 2008 Update to the Language Subtag Registry old draft for the revision of RFC 4645 now obsolete and may disappear soon 1MB IETF WG LTRU Retrieved 2008 06 23 ISO 639 2 Language Code List Codes for the representation of names of languages Library of Congress loc gov Retrieved 28 July 2015 Davis M Phillips A Umaoka Y Falk C February 2012 BCP 47 Extension T Transformed Content rfc editor org RFC Editor informational doi 10 17487 RFC6497 RFC 6497 Retrieved 24 June 2022 Davis M Phillips A Umaoka Y December 2010 BCP 47 Extension U rfc editor org RFC Editor informational doi 10 17487 RFC6067 RFC 6067 Retrieved 24 June 2022 External links EditBCP 47 Language Tags current specification contains two RFCs RFC 5646 and RFC 4647 published separately at different dates but concatenated in a single document also referencing the related informational RFC 5645 which complements the previous informational RFC 4645 as well other individual registration forms published separately by others for each language added or modified in the Registry between these BCP 47 revisions Language Subtag Registry maintained by IANA Language Subtag Registry Search find subtags and view entries in the Registry Language tags in HTML and XML from the W3C Language Tags Archived 2017 10 19 at the Wayback Machine from the IETF Language Tag Registry Update working group Retrieved from https en wikipedia org w index php title IETF language tag amp oldid 1132785030, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.