fbpx
Wikipedia

T.51/ISO/IEC 6937

T.51 / ISO/IEC 6937:2001, Information technology — Coded graphic character set for text communication — Latin alphabet, is a multibyte extension of ASCII, or more precisely ISO/IEC 646-IRV.[1] It was developed in common with ITU-T (then CCITT) for telematic services under the name of T.51, and first became an ISO standard in 1983. Certain byte codes are used as lead bytes for letters with diacritics (accents). The value of the lead byte often indicates which diacritic that the letter has, and the follow byte then has the ASCII-value for the letter that the diacritic is on.

T.51
Latin based coded character sets for telematic services
StatusIn force
Year started1984
Latest version(09/92)
September 1992
OrganizationITU-T
CommitteeStudy Group VIII
Related standardsT.61, ETS 300 706, ISO/IEC 10367, ISO/IEC 2022, ISO 5426
Domainencoding
LicenseFreely available
Websitehttps://www.itu.int/rec/T-REC-T.51
T.51
Alias(es)
  • Code page 20269
  • ISO-IR-90 (old)
  • ISO-IR-142 (old)
  • ISO-IR-156
Standard
Based onITU T.61
Other related encoding(s)

ISO/IEC 6937's architects were Hugh McGregor Ross, Peter Fenwick, Bernard Marti and Loek Zeckendorf.

ISO6937/2 defines 327 characters found in modern European languages using the Latin alphabet. Non-Latin European characters, such as Cyrillic and Greek, are not included in the standard. Also, some diacritics used with the Latin alphabet like the Romanian comma are not included, using cedilla instead as no distinction between cedilla and comma below was made at the time.

IANA has registered the charset names ISO_6937-2-25 and ISO_6937-2-add for two (older) versions of this standard (plus control codes). But in practice this character encoding is unused on the Internet.

Single byte characters edit

The primary set (first half) originally followed ISO 646-IRV before the ISO/IEC 646:1991 revision, that is, mostly following ASCII but with character 0x24 still denoted as an "international currency sign" (¤) instead of the dollar sign ($). The 1992 edition of ITU T.51 permits existing CCITT services to continue to interpret 0x24 as the international currency sign, but stipulates that new telecommunication applications should use it for the dollar sign (i.e. following the current ISO 646-IRV), and instead represent the international currency sign using the supplementary set.[2]

The supplementary set (second half) contains a selection of spacing and non-spacing graphic characters, additional symbols and some locations reserved for future standardisation.

Both of these are ISO/IEC 2022 graphical character sets, with the primary set being a 94-code set and the secondary set being a 96-code set. In contexts where ISO 2022 code extension techniques are not in use, the primary set is designated as the G0 set and invoked over GL (0x20..0x7F), whereas the supplementary set is designated as the G2 set and invoked over GR (0xA0..0xFF) in an 8-bit environment, or by using the control code 0x19 as a single-shift in a 7-bit environment.[3] This encoding of the Single Shift Two code matches its location in ISO-IR-106.[4]

The ISO/IEC 2022 escape sequence to designate the supplementary set of ISO/IEC 6937 as the G2 set is ESC . R (hex 1B 2E 52).[2][5][6] The older ISO 6937/2:1983 supplementary set is registered as a 94-code set, and designated to G2 with ESC * l (hex 1B 2A 6C).[5][7]

Two byte characters edit

Accented letters which are not allocated single codes in the primary or supplementary set are coded using two bytes. The first byte, the "non spacing diacritical mark", is followed by a letter from the base set e.g.:

small e with acute accent (é) = [Acute]+e 

The ITU T.51 standard allocates column 4 of the supplementary set (i.e. 0xC0–CF when used in 8-bit format) to non-spacing diacritic characters.[2] However, ISO/IEC 6937 defines a fully specified character repertoire, mapping a list of composition sequences to ISO/IEC 10646 character names. The isolated nonspacing bytes are not included in this repertoire, although spacing variants of the diacritics not otherwise present in ASCII are included, with the ASCII space being the trail byte.[5][8] Hence, only certain combinations of lead byte and follow byte conform to the ISO/IEC standard.

This repertoire is also affixed to the ITU version of the specification as Annex A, although the ITU version does not reference it from the main text. It is described as a "unified superset" of the Latin-script character repertoires.[2] It corresponds to the repertoire of ISO/IEC 10367 when the ASCII, Latin-1 (or Latin-5), Latin-2 and supplementary Latin sets are used.[5]

This system also differs from the Unicode combining character system in that the diacritic code precedes the letter (as opposed to following it), making it more similar to ANSEL.

A little anomaly is that Latin Small Letter G with Cedilla is coded as if it were with an acute accent, that is, with a 0xC2 lead byte, since due to its descender interfering with a cedilla, the lowercase letter is usually with turned comma above: Ģ ģ.

In total 13 diacritical marks can be followed by the selected characters from the primary set:

Accent Code Second character Result
Grave 0xC1 AEIOUaeiou ÀÈÌÒÙàèìòù
Acute 0xC2 ACEILNORSUYZacegilnorsuyz ÁĆÉÍĹŃÓŔŚÚÝŹáćéģíĺńóŕśúýź
Circumflex 0xC3 ACEGHIJOSUWYaceghijosuwy ÂĈÊĜĤÎĴÔŜÛŴŶâĉêĝĥîĵôŝûŵŷ
Tilde 0xC4 AINOUainou ÃĨÑÕŨãĩñõũ
Macron 0xC5 AEIOUaeiou ĀĒĪŌŪāēīōū
Breve 0xC6 AGUagu ĂĞŬăğŭ
Dot 0xC7 CEGIZcegz ĊĖĠİŻċėġż
Umlaut or diæresis 0xC8 AEIOUYaeiouy ÄËÏÖÜŸäëïöüÿ
Ring 0xCA AUau ÅŮåů
Cedilla 0xCB CGKLNRSTcklnrst ÇĢĶĻŅŖŞŢçķļņŗşţ
Double Acute 0xCD OUou ŐŰőű
Ogonek 0xCE AEIUaeiu ĄĘĮŲąęįų
Caron 0xCF CDELNRSTZcdelnrstz ČĎĚĽŇŘŠŤŽčďěľňřšťž

Codepage layout edit

The reference to combining characters in the U+0300—U+036F range for the codes in the range 0xC1—0xCF below is subject to the caveats mentioned above; they cannot simply be mapped to the codepoints listed. Also, Unicode distinguishes 0xE2 into uppercase D with stroke and uppercase Eth, which usually look different for the lowercase letters (0xF2 and 0xF3).

The older 1988 edition of ITU T.51 defined two versions of the supplementary set, with the first version lacking the non-breaking space, soft hyphen, not sign (¬) and broken bar (¦) present in the second version. The first version was defined as an extension of the T.61 supplementary set, and the second version as an extension of the first version.[9] The current (1992) edition only includes the second version, deprecates certain characters, and updates the primary set to the current ISO-646-IRV (ASCII), although existing telematic services are permitted to retain the older behaviour.[2]

ISO/IEC 6937 or ITU T.51 (Latin)
0 1 2 3 4 5 6 7 8 9 A B C D E F
0x
1x
2x  SP  ! " # $/¤[a] % & ' ( ) * + , - . /
3x 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
4x @ A B C D E F G H I J K L M N O
5x P Q R S T U V W X Y Z [ \ ] ^ _
6x ` a b c d e f g h i j k l m n o
7x p q r s t u v w x y z { | } ~
8x
9x
Ax NBSP ¡ ¢ £ $[b] ¥ #[b] § ¤ «
Bx ° ± ² ³ × µ · ÷ » ¼ ½ ¾ ¿
Cx ◌̀ ◌́ ◌̂ ◌̃ ◌̄ ◌̆ ◌̇ ◌̈ ◌̊ ◌̧ ◌̲[c] ◌̋ ◌̨ ◌̌
Dx ¹ ® © ¬ ¦
Ex Æ Đ/Ð ª Ħ [d] IJ Ŀ Ł Ø Œ º Þ Ŧ Ŋ ʼn
Fx ĸ æ đ ð ħ ı ij ŀ ł ø œ ß þ ŧ ŋ SHY
  Differences from T.61

Videotex version edit

The versions of the supplementary set used by the ITU T.101 standard for Videotex are based on the first supplementary set of the 1988 edition of T.51.

The default G2 set for Data Syntax 2 adds a ΅ at 0xC0, for combination with codes from a Greek primary set.[10]

The supplementary set for Data Syntax 3 adds non-spacing marks for a "vector overbar" and solidus and several semigraphic characters.[11]

ETS 300 706 version edit

The ETS 300 706 standard for World System Teletext bases its G2 set on ISO 6937.[12] It is a superset of the supplementary set of T.61, and a superset of the first supplementary set of the 1988 edition of T.51, but collides with the current edition of T.51 in certain positions. Diacritic codes in the ETS version are specified as being "for association with" characters from the G0 set in use,[12] such as US-ASCII or BS_viewdata. This version is shown in the chart below.

World System Teletext, Latin G2 Set (ETS 300 706:1997)[12]
0 1 2 3 4 5 6 7 8 9 A B C D E F
Ax  SP  ¡ ¢ £ $ ¥ # § ¤ «
Bx ° ± ² ³ × µ · ÷ » ¼ ½ ¾ ¿
Cx ◌̀ ◌́ ◌̂ ◌̃ ◌̄ ◌̆ ◌̇ ◌̈ ̣◌̣ ◌̊ ◌̧ ◌̲ ◌̋ ◌̨ ◌̌
Dx ¹ ® © α
Ex Æ Đ/Ð ª Ħ IJ Ŀ Ł Ø Œ º Þ Ŧ Ŋ ʼn
Fx ĸ æ đ ð ħ ı ij ŀ ł ø œ ß þ ŧ ŋ
  Differences from T.51

See also edit

Footnotes edit

  1. ^ Continued use for ¤ permitted for existing CCITT services only.[2]
  2. ^ a b Permitted for existing CCITT services only, otherwise the ASCII representation should be used.[2]
  3. ^ Noted in the ITU version of the standard as having existing use for underlined text, in combination with any other character including accented characters. Although the 1988 ITU edition includes this code,[9] the 1992 ITU edition discourages sending this code in favour of ANSI escape sequences, although it does mention that it should be correctly interpreted when received by applicable systems.[2] Previous editions of the ISO/IEC version of the standard also allowed combining this code with any character in the defined repertoire,[7] whereas more recent revisions do not include this code.[5]
  4. ^ An early draft placed ȷ in this position.

References edit

  1. ^ "T.51 : Latin based coded character sets for telematic services". www.itu.int. from the original on 2019-10-08. Retrieved 2019-11-14.
  2. ^ a b c d e f g h CCITT (1992-09-18). Latin based coded character sets for telematic services (1992 ed.). Recommendation T.51.
  3. ^ ITU-T (1995-08-11). Recommendation T.51 (1992) Amendment 1.
  4. ^ ITU (1985-08-01). Teletex Primary Set of Control Functions (PDF). ITSCJ/IPSJ. ISO-IR-106.
  5. ^ a b c d e ISO/IEC JTC 1/SC 2/WG 3 (1998-04-15). WD 6937, Coded graphic character set for text communication - Latin alphabet (PDF). JTC1/SC2/N454.{{citation}}: CS1 maint: numeric names: authors list (link)
  6. ^ ISO/IEC JTC 1/SC 2/WG 3 (1991-12-15). Supplementary Set of ISO/IEC 6937:1992 (PDF). ITSCJ/IPSJ. ISO-IR-156.{{citation}}: CS1 maint: numeric names: authors list (link) (The left-hand side is US-ASCII.)
  7. ^ a b ISO/TC97/SC2/WG4 (1985-01-10). Supplementary Set of Latin Alphabetic and non-Alphabetic Graphic Characters (PDF). ITSCJ/IPSJ. ISO-IR-90.{{citation}}: CS1 maint: numeric names: authors list (link)
  8. ^ Petersen, J. K. (2002-05-29). The Telecommunications Illustrated Dictionary. CRC Press. p. 888. ISBN 978-1-4200-4067-8.
  9. ^ a b CCITT (1988). Coded character sets for telematic services (1988 ed.). Recommendation T.51.
  10. ^ CCITT (1988-11-01). Supplementary Set of Graphic Characters for Videotex (PDF). ITSCJ/IPSJ. ISO-IR-70.
  11. ^ CCITT (1986-11-30). Supplementary Set of Graphic Characters for CCITT Recommendation T.101, Data Syntax III (PDF). ITSCJ/IPSJ. ISO-IR-128.
  12. ^ a b c ETSI (1997). "15.6.3 Latin G2 Set". Enhanced Teletext specification (PDF) (PDF). p. 116. ETS 300 706.

External links edit

  • ITU Recommendation T.51
  • ISO pages: ISO 6937-1:1983, ISO 6937-2:1983, ISO 6937-2:1983/Add 1:1989, ISO/IEC 6937:1994, ISO/IEC 6937:2001
  • WD 6937, Coded graphic character set for text communication - Latin alphabet (Revision of ISO/IEC 6937:1994) (ISO/IEC 6937:1994 draft)
  • ISO-IR-156 (ISO-IR registration of right-hand part)

6937, 6937, 2001, information, technology, coded, graphic, character, text, communication, latin, alphabet, multibyte, extension, ascii, more, precisely, developed, common, with, then, ccitt, telematic, services, under, name, first, became, standard, 1983, cer. T 51 ISO IEC 6937 2001 Information technology Coded graphic character set for text communication Latin alphabet is a multibyte extension of ASCII or more precisely ISO IEC 646 IRV 1 It was developed in common with ITU T then CCITT for telematic services under the name of T 51 and first became an ISO standard in 1983 Certain byte codes are used as lead bytes for letters with diacritics accents The value of the lead byte often indicates which diacritic that the letter has and the follow byte then has the ASCII value for the letter that the diacritic is on T 51Latin based coded character sets for telematic servicesStatusIn forceYear started1984Latest version 09 92 September 1992OrganizationITU TCommitteeStudy Group VIIIRelated standardsT 61 ETS 300 706 ISO IEC 10367 ISO IEC 2022 ISO 5426DomainencodingLicenseFreely availableWebsitehttps www itu int rec T REC T 51 T 51Alias es Code page 20269 ISO IR 90 old ISO IR 142 old ISO IR 156StandardISO IEC 6937 ITU T 51Based onITU T 61Other related encoding s ETS 300 706 ISO 5426 NeXT Multinational PostScript Standard Encoding ITU T 101vte ISO IEC 6937 s architects were Hugh McGregor Ross Peter Fenwick Bernard Marti and Loek Zeckendorf ISO6937 2 defines 327 characters found in modern European languages using the Latin alphabet Non Latin European characters such as Cyrillic and Greek are not included in the standard Also some diacritics used with the Latin alphabet like the Romanian comma are not included using cedilla instead as no distinction between cedilla and comma below was made at the time IANA has registered the charset names ISO 6937 2 25 and ISO 6937 2 add for two older versions of this standard plus control codes But in practice this character encoding is unused on the Internet Contents 1 Single byte characters 2 Two byte characters 3 Codepage layout 3 1 Videotex version 3 2 ETS 300 706 version 4 See also 5 Footnotes 6 References 7 External linksSingle byte characters editThe primary set first half originally followed ISO 646 IRV before the ISO IEC 646 1991 revision that is mostly following ASCII but with character 0x24 still denoted as an international currency sign instead of the dollar sign The 1992 edition of ITU T 51 permits existing CCITT services to continue to interpret 0x24 as the international currency sign but stipulates that new telecommunication applications should use it for the dollar sign i e following the current ISO 646 IRV and instead represent the international currency sign using the supplementary set 2 The supplementary set second half contains a selection of spacing and non spacing graphic characters additional symbols and some locations reserved for future standardisation Both of these are ISO IEC 2022 graphical character sets with the primary set being a 94 code set and the secondary set being a 96 code set In contexts where ISO 2022 code extension techniques are not in use the primary set is designated as the G0 set and invoked over GL 0x20 0x7F whereas the supplementary set is designated as the G2 set and invoked over GR 0xA0 0xFF in an 8 bit environment or by using the control code 0x19 as a single shift in a 7 bit environment 3 This encoding of the Single Shift Two code matches its location in ISO IR 106 4 The ISO IEC 2022 escape sequence to designate the supplementary set of ISO IEC 6937 as the G2 set is ESC R hex 1B 2E 52 2 5 6 The older ISO 6937 2 1983 supplementary set is registered as a 94 code set and designated to G2 with ESC l hex 1B 2A 6C 5 7 Two byte characters editAccented letters which are not allocated single codes in the primary or supplementary set are coded using two bytes The first byte the non spacing diacritical mark is followed by a letter from the base set e g small e with acute accent e Acute e The ITU T 51 standard allocates column 4 of the supplementary set i e 0xC0 CF when used in 8 bit format to non spacing diacritic characters 2 However ISO IEC 6937 defines a fully specified character repertoire mapping a list of composition sequences to ISO IEC 10646 character names The isolated nonspacing bytes are not included in this repertoire although spacing variants of the diacritics not otherwise present in ASCII are included with the ASCII space being the trail byte 5 8 Hence only certain combinations of lead byte and follow byte conform to the ISO IEC standard This repertoire is also affixed to the ITU version of the specification as Annex A although the ITU version does not reference it from the main text It is described as a unified superset of the Latin script character repertoires 2 It corresponds to the repertoire of ISO IEC 10367 when the ASCII Latin 1 or Latin 5 Latin 2 and supplementary Latin sets are used 5 This system also differs from the Unicode combining character system in that the diacritic code precedes the letter as opposed to following it making it more similar to ANSEL A little anomaly is that Latin Small Letter G with Cedilla is coded as if it were with an acute accent that is with a 0xC2 lead byte since due to its descender interfering with a cedilla the lowercase letter is usually with turned comma above G g In total 13 diacritical marks can be followed by the selected characters from the primary set Accent Code Second character Result Grave 0xC1 AEIOUaeiou AEIOUaeiou Acute 0xC2 ACEILNORSUYZacegilnorsuyz ACEIĹNoŔSUYZacegiĺnoŕsuyz Circumflex 0xC3 ACEGHIJOSUWYaceghijosuwy AĈEĜĤIĴOŜUŴŶaĉeĝĥiĵoŝuŵŷ Tilde 0xC4 AINOUainou AĨNOŨaĩnoũ Macron 0xC5 AEIOUaeiou AEiŌuaeiōu Breve 0xC6 AGUagu ĂGŬăgŭ Dot 0xC7 CEGIZcegz ĊĖĠIZċeġz Umlaut or diaeresis 0xC8 AEIOUYaeiouy AEIOUŸaeiouy Ring 0xCA AUau AUau Cedilla 0xCB CGKLNRSTcklnrst CGkLNŖSŢcklnŗsţ Double Acute 0xCD OUou OUou Ogonek 0xCE AEIUaeiu AeĮŲaeįu Caron 0xCF CDELNRSTZcdelnrstz CDEĽNRSTZcdeľnrstzCodepage layout editThe reference to combining characters in the U 0300 U 036F range for the codes in the range 0xC1 0xCF below is subject to the caveats mentioned above they cannot simply be mapped to the codepoints listed Also Unicode distinguishes 0xE2 into uppercase D with stroke and uppercase Eth which usually look different for the lowercase letters 0xF2 and 0xF3 The older 1988 edition of ITU T 51 defined two versions of the supplementary set with the first version lacking the non breaking space soft hyphen not sign and broken bar present in the second version The first version was defined as an extension of the T 61 supplementary set and the second version as an extension of the first version 9 The current 1992 edition only includes the second version deprecates certain characters and updates the primary set to the current ISO 646 IRV ASCII although existing telematic services are permitted to retain the older behaviour 2 ISO IEC 6937 or ITU T 51 Latin 0 1 2 3 4 5 6 7 8 9 A B C D E F 0x 1x 2x SP a amp 3x 0 1 2 3 4 5 6 7 8 9 lt gt 4x A B C D E F G H I J K L M N O 5x P Q R S T U V W X Y Z 6x a b c d e f g h i j k l m n o 7x p q r s t u v w x y z 8x 9x Ax NBSP b b Bx µ Cx c Dx c Ex Ω AE Đ D ª Ħ d IJ Ŀ L O Œ º TH Ŧ Ŋ ʼn Fx ĸ ae đ d ħ i ij ŀ l o œ ss th ŧ ŋ SHY Differences from T 61 Videotex version edit Main article Videotex character set The versions of the supplementary set used by the ITU T 101 standard for Videotex are based on the first supplementary set of the 1988 edition of T 51 The default G2 set for Data Syntax 2 adds a at 0xC0 for combination with codes from a Greek primary set 10 The supplementary set for Data Syntax 3 adds non spacing marks for a vector overbar and solidus and several semigraphic characters 11 ETS 300 706 version edit The ETS 300 706 standard for World System Teletext bases its G2 set on ISO 6937 12 It is a superset of the supplementary set of T 61 and a superset of the first supplementary set of the 1988 edition of T 51 but collides with the current edition of T 51 in certain positions Diacritic codes in the ETS version are specified as being for association with characters from the G0 set in use 12 such as US ASCII or BS viewdata This version is shown in the chart below World System Teletext Latin G2 Set ETS 300 706 1997 12 0 1 2 3 4 5 6 7 8 9 A B C D E F Ax SP Bx µ Cx Dx c a Ex Ω AE Đ D ª Ħ IJ Ŀ L O Œ º TH Ŧ Ŋ ʼn Fx ĸ ae đ d ħ i ij ŀ l o œ ss th ŧ ŋ Differences from T 51See also editITU T 50 ITU T 61 a closely related character encoding for Teletex useFootnotes edit Continued use for permitted for existing CCITT services only 2 a b Permitted for existing CCITT services only otherwise the ASCII representation should be used 2 Noted in the ITU version of the standard as having existing use for underlined text in combination with any other character including accented characters Although the 1988 ITU edition includes this code 9 the 1992 ITU edition discourages sending this code in favour of ANSI escape sequences although it does mention that it should be correctly interpreted when received by applicable systems 2 Previous editions of the ISO IEC version of the standard also allowed combining this code with any character in the defined repertoire 7 whereas more recent revisions do not include this code 5 An early draft placed ȷ in this position References edit T 51 Latin based coded character sets for telematic services www itu int Archived from the original on 2019 10 08 Retrieved 2019 11 14 a b c d e f g h CCITT 1992 09 18 Latin based coded character sets for telematic services 1992 ed Recommendation T 51 ITU T 1995 08 11 Recommendation T 51 1992 Amendment 1 ITU 1985 08 01 Teletex Primary Set of Control Functions PDF ITSCJ IPSJ ISO IR 106 a b c d e ISO IEC JTC 1 SC 2 WG 3 1998 04 15 WD 6937 Coded graphic character set for text communication Latin alphabet PDF JTC1 SC2 N454 a href Template Citation html title Template Citation citation a CS1 maint numeric names authors list link ISO IEC JTC 1 SC 2 WG 3 1991 12 15 Supplementary Set of ISO IEC 6937 1992 PDF ITSCJ IPSJ ISO IR 156 a href Template Citation html title Template Citation citation a CS1 maint numeric names authors list link The left hand side is US ASCII a b ISO TC97 SC2 WG4 1985 01 10 Supplementary Set of Latin Alphabetic and non Alphabetic Graphic Characters PDF ITSCJ IPSJ ISO IR 90 a href Template Citation html title Template Citation citation a CS1 maint numeric names authors list link Petersen J K 2002 05 29 The Telecommunications Illustrated Dictionary CRC Press p 888 ISBN 978 1 4200 4067 8 a b CCITT 1988 Coded character sets for telematic services 1988 ed Recommendation T 51 CCITT 1988 11 01 Supplementary Set of Graphic Characters for Videotex PDF ITSCJ IPSJ ISO IR 70 CCITT 1986 11 30 Supplementary Set of Graphic Characters for CCITT Recommendation T 101 Data Syntax III PDF ITSCJ IPSJ ISO IR 128 a b c ETSI 1997 15 6 3 Latin G2 Set Enhanced Teletext specification PDF PDF p 116 ETS 300 706 External links editITU Recommendation T 51 ISO pages ISO 6937 1 1983 ISO 6937 2 1983 ISO 6937 2 1983 Add 1 1989 ISO IEC 6937 1994 ISO IEC 6937 2001 WD 6937 Coded graphic character set for text communication Latin alphabet Revision of ISO IEC 6937 1994 ISO IEC 6937 1994 draft ISO IR 156 ISO IR registration of right hand part Retrieved from https en wikipedia org w index php title T 51 ISO IEC 6937 amp oldid 1219721974, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.