fbpx
Wikipedia

Big5

Big-5 or Big5 is a Chinese character encoding method used in Taiwan, Hong Kong, and Macau for traditional Chinese characters.

Big5
MIME / IANABig5
Alias(es)Big-5, 大五碼
Language(s)Traditional Chinese, English
Partial support:
Simplified Chinese, Greek, Japanese, Russian, Bulgarian, some of IPA letters for phonetic usage.[1]
Created byInstitute for Information Industry
ClassificationExtended ASCII,[a][b] variable-width encoding, DBCS, CJK encoding
ExtendsASCII[b]
ExtensionsWindows-950, Big5-HKSCS, numerous others
Other related encoding(s)CNS 11643
  1. ^ Not in the strictest sense of the term, as ASCII bytes can appear as trail bytes.
  2. ^ a b Big5 does not specify a single-byte component; however, ASCII (or an extension) is used in practice.

The People's Republic of China (PRC), which uses simplified Chinese characters, uses the GB 18030 character set instead.

Big5 gets its name from the consortium of five companies in Taiwan that developed it.[2]

Organization

The original Big5 character set is sorted first by usage frequency, second by stroke count, lastly by Kangxi radical.

The original Big5 character set lacked many commonly used characters. To solve this problem, each vendor developed its own extension. The ETen extension became part of the current Big5 standard through popularity.

The structure of Big5 does not conform to the ISO 2022 standard, but rather bears a certain similarity to the Shift JIS encoding. It is a double-byte character set (DBCS) with the following structure:

First byte ("lead byte") 0x81 to 0xfe (or 0xa1 to 0xf9 for non-user-defined characters)
Second byte 0x40 to 0x7e, 0xa1 to 0xfe

(the prefix 0x signifying hexadecimal numbers).

Standard assignments (excluding vendor or user-defined extensions) do not use the bytes 0x7F through 0xA0, nor 0xFF, as either lead (first) or trail (second) bytes. Bytes 0xA1 through 0xFE are used for both lead and trail bytes for double-byte (Big5) codes. Bytes 0x40 through 0x7E are used as trail bytes following a lead byte, or for single-byte codes otherwise. If the second byte is not in either range, behavior is unspecified (i.e., varies from system to system). Additionally, certain variants of the Big5 character set, for example the HKSCS, use an expanded range for the lead byte, including values in the 0x81 to 0xA0 range (similar to Shift JIS), whereas others use reduced lead byte ranges (for instance, the Apple Macintosh variant uses 0xFD through 0xFF as single-byte codes, limiting the lead byte range to 0xA1 through 0xFC).[3]

The numerical value of individual Big5 codes are frequently given as a 4-digit hexadecimal number, which describes the two bytes that comprise the Big5 code as if the two bytes were a big endian representation of a 16-bit number. For example, the Big5 code for a full-width space, which are the bytes 0xa1 0x40, is usually written as 0xa140 or just A140.

Strictly speaking, the Big5 encoding contains only DBCS characters. However, in practice, the Big5 codes are always used together with an unspecified, system-dependent single-byte character set (ASCII, or an 8-bit character set such as code page 437), so that you will find a mix of DBCS characters and single-byte characters in Big5-encoded text. Bytes in the range 0x00 to 0x7f that are not part of a double-byte character are assumed to be single-byte characters. (For a more detailed description of this problem, please see the discussion on "The Matching SBCS" below.)

The meaning of non-ASCII single bytes outside the permitted values that are not part of a double-byte character varies from system to system. In old MSDOS-based systems, they are likely to be displayed as 8-bit characters; in modern systems, they are likely to either give unpredictable results or generate an error.

A more detailed look at the organization

In the original Big5, the encoding is compartmentalized into different zones:

0x8140 to 0xA0FE Reserved for user-defined characters 造字
0xA140 to 0xA3BF "Graphical characters" 圖形碼
0xA3C0 to 0xA3FE Reserved, not for user-defined characters
0xA440 to 0xC67E Frequently used characters 常用字
0xC6A1 to 0xC8FE Reserved for user-defined characters
0xC940 to 0xF9D5 Less frequently used characters 次常用字
0xF9D6 to 0xFEFE Reserved for user-defined characters

The "graphical characters" actually comprise punctuation marks, partial punctuation marks (e.g., half of a dash, half of an ellipsis; see below), dingbats, foreign characters, and other special characters (e.g., presentational "full width" forms, digits for Suzhou numerals, zhuyin fuhao, etc.)

In most vendor extensions, extended characters are placed in the various zones reserved for user-defined characters, each of which are normally regarded as associated with the preceding zone. For example, additional "graphical characters" (e.g., punctuation marks) would be expected to be placed in the 0xa3c00xa3fe range, and additional logograms would be placed in either the 0xc6a10xc8fe or the 0xf9d60xfefe range. Sometimes, this is not possible due to the large number of extended characters to be added; for example, Cyrillic letters and Japanese kana have been placed in the zone associated with "frequently-used characters".

What a Big5 code actually encodes

An individual Big5 code does not always represent a complete semantic unit. The Big5 codes of logograms are always logograms, but codes in the "graphical characters" section are not always complete "graphical characters". What Big5 encodes are particular graphical representations of characters or part of characters that happen to fit in the space taken by two monospaced ASCII characters. This is a property of double-byte character sets as normally used in CJK (Chinese, Japanese, and Korean) computing, and is not a unique problem of Big5.

(The above might need some explanation by putting it in historical perspective, as it is theoretically incorrect: Back when text mode personal computing was still the norm, characters were normally represented as single bytes and each character takes one position on the screen. There was therefore a practical reason to insist that double-byte characters must take up two positions on the screen, namely that off-the-shelf, American-made software would then be usable without modification in a DBCS-based system. If a character can take an arbitrary number of screen positions, software that assumes that one byte of text takes one screen position would produce incorrect output. Of course, if a computer never had to deal with the text screen, the manufacturer would not enforce this artificial restriction; the Apple Macintosh is an example. Nevertheless, the encoding itself must be designed so that it works correctly on text-screen-based systems.)

To illustrate this point, consider the Big5 code 0xa14b (…). To English speakers this looks like an ellipsis and the Unicode standard identifies it as such; however, in Chinese, the ellipsis consists of six dots that fit in the space of two Chinese characters (……), so in fact there is no Big5 code for the Chinese ellipsis, and the Big5 code 0xa14b just represents half of a Chinese ellipsis. It represents only half of an ellipsis because the whole ellipsis should take the space of two Chinese characters, and in many DBCS systems one DBCS character must take exactly the space of one Chinese character.

Characters encoded in Big5 do not always represent things that can be readily used in plain text files; an example is "citation mark" (0xa1ca, ﹋), which is, when used, required to be typeset under the title of literary works. Another example is the Suzhou numerals, which is a form of scientific notation that requires the number to be laid out in a 2-D form consisting of at least two rows.

The Matching SBCS

In practice, Big5 cannot be used without a matching Single Byte Character Set (SBCS); this is mostly to do with a compatibility reason. However, as in the case of other CJK DBCS character sets, the SBCS to use has never been specified. Big5 has always been defined as a DBCS, though when used it must be paired with a suitable, unspecified SBCS and therefore used as what some people call a MBCS; nevertheless, Big5 by itself, as defined, is strictly a DBCS.

The SBCS to use being unspecified implies that the SBCS used can theoretically vary from system to system. Nowadays, ASCII is the only possible SBCS one would use. However, in old DOS-based systems, Code Page 437—with its extra special symbols in the control code area including position 127—was much more common. Yet, on a Macintosh system with the Chinese Language Kit, or on a Unix system running the cxterm terminal emulator, the SBCS paired with Big5 would not be Code Page 437.

Outside the valid range of Big5, the old DOS-based systems would routinely interpret things according to the SBCS that is paired with Big5 on that system. In such systems, characters 127 to 160, for example, were very likely not avoided because they would produce invalid Big5, but used because they would be valid characters in Code Page 437.

The modern characterization of Big5 as an MBCS consisting of the DBCS of Big5 plus the SBCS of ASCII is therefore historically incorrect and potentially flawed, as the choice of the matching SBCS was, and theoretically still is, quite independent of the flavour of Big5 being used.

History

The inability of ASCII to support large character sets such as used for Chinese, Japanese and Korean led to governments and industry to find creative solutions to enable their languages to be rendered on computers. A variety of ad hoc and usually proprietary input methods led to efforts to develop a standard system. As a result, Big5 encoding was defined by the Institute for Information Industry of Taiwan in 1984. The name "Big5" is in recognition that the standard emerged from collaboration of five of Taiwan's largest IT firms: Acer (宏碁); MiTAC (神通); JiaJia (佳佳), ZERO ONE Technology (零壹 or 01tech); and, First International Computer (FIC) (大眾).

Big5 was rapidly popularized in Taiwan and worldwide among Chinese who used the traditional Chinese character set through its adoption in several commercial software packages, notably the E-TEN Chinese DOS input system (ETen Chinese System). The Republic of China government declared Big5 as their standard in mid-1980s since it was, by then, the de facto standard for using traditional Chinese on computers.

Extensions

The original Big-5 only include CJK logograms from the Charts of Standard Forms of Common National Characters (4808 characters) and Less-Than-Common National Characters (6343 characters), but not letters from people's names, place names, dialects, chemistry, biology, Japanese kana. As a result, many Big-5 supporting software include extensions to address the problems.

The plethora of variations make UTF-8 or UTF-16 a more consistent code page for modern use.

Vendor extensions

ETen extensions

In ETen (倚天) Chinese operating system, the following code points are added, to add support for some characters present in the IBM 5550's code page but absent from generic Big5:

  • A3C0–A3E0: 33 control characters.
  • C6A1–C875: circle 1–10, bracket 1–10, Roman numerals 1–9 (i–ix), CJK radical glyphs, Japanese hiragana, Japanese katakana, Cyrillic characters
  • F9D6–F9FE: the characters '碁', '銹', '恒', '裏', '墻', '粧' and '嫺', followed by 34 additional semigraphic symbols.

In some versions of ETen, there are extra graphical symbols and simplified Chinese characters.

Microsoft code pages

Microsoft (微軟) created its own version of Big5 extension as Code page 950 for use with Microsoft Windows, which supports the F9D6-F9FE code points from ETEN's extensions. In some versions of Windows, the euro currency symbol is mapped to Big-5 code point A3E1.

After installing Microsoft's HKSCS patch on top of traditional Chinese Windows (or any version of Windows 2000 and above with proper language pack), applications using code page 950 automatically use a hidden code page 951 table. The table supports all code points in HKSCS-2001, except for the compatibility code points specified by the standard.[4]

IBM code pages

In contrast to Microsoft's code page 950, IBM's CCSID 950 comprises single byte code page 1114 (CCSID 1114) and double byte code page 947 (CCSID 947).[5][6][7] It incorporates ETEN extensions for lead bytes 0xA3,[8] 0xC6,[9][10] 0xC7[11] and 0xC8,[9][12] while omitting those with lead byte 0xF9 (which Microsoft includes), mapping them instead to the Private Use Area as user-defined characters.[9][13] It also includes two non-ETEN extension regions with trail bytes 0x81–A0, i.e. outside the usual Big5 trail byte range but similar to the Big5+ trail byte range: area 5 has lead bytes 0xF2–F9 and contains IBM-selected characters, while area 9 has lead bytes 0x81–8C and is a user-defined region.[14]

IBM refers to the euro sign update of their Big-5 variant as CCSID 1370, which includes both single-byte (0x80) and double-byte (0xA3E1) euro signs.[15] It comprises single byte code page 1114 (CCSID 5210) and double byte code page 947 (CCSID 21427).[15][16][17] For better compatibility with Microsoft's variant in IBM Db2, IBM also define the pure double-byte Code page 1372[18] and the associated variable-width CCSID 1373, which corresponds to Microsoft's code page 950.[19]

IBM assigns CCSID 5471 to the HKSCS-2001 Big5 code page (with CPGID 1374 as CCSID 5470 as the double byte component),[20][21] CCSID 9567 to the HKSCS-2004 code page (with CPGID 1374 as CCSID 9566 as the double byte component),[22] and CCSID 13663 to the HKSCS-2008 code page (with CPGID 1374 as CCSID 13662 as the double byte component),[23] while CCSID 1375 is assigned to a growing HKSCS code page, currently equivalent to CCSID 13663.[24]

ChinaSea font

ChinaSea fonts (中國海字集)[25] are Traditional Chinese fonts made by ChinaSea. The fonts are rarely sold separately, but are bundled with other products, such as the Chinese version of Microsoft Office 97. The fonts support Japanese kana, kokuji, and other characters missing in Big-5. As a result, the ChinaSea extensions have become more popular than the government-supported extensions.[as of?] Some Hong Kong BBSes had used encodings in ChinaSea fonts before the introduction of HKSCS.

'Sakura' font

The (日和字集 Sakura Version) is developed in Hong Kong and is designed to be compatible with HKSCS. It adds support for kokuji and proprietary dingbats (including Doraemon) not found in HKSCS.

Unicode-at-on

Unicode-at-on (Unicode補完計畫), formerly BIG5 extension, extends BIG-5 by altering code page tables, but uses the ChinaSea extensions starting with version 2. However, with the bankruptcy of ChinaSea, late development, and the increasing popularity of HKSCS and Unicode (the project is not compatible with HKSCS), the success of this extension is limited at best.

Despite the problems, characters previously mapped to Unicode Private Use Area are remapped to the standardized equivalents when exporting characters to Unicode format.

OPG

The web sites of the Oriental Daily News and Sun Daily, belonging to the Oriental Press Group Limited (東方報業集團有限公司) in Hong Kong, used a downloadable font with a different Big-5 extension coding than the HKSCS.

Official extensions

Taiwan Ministry of Education font

The Taiwan Ministry of Education supplied its own font, the Taiwan Ministry of Education font (臺灣教育部造字檔) for use internally.

Taiwan Council of Agriculture font

Taiwan's Council of Agriculture font, Executive Yuan introduced a 133-character custom font, the Taiwan Council of Agriculture font (臺灣農委會常用中文外字集) that includes 84 characters from the fish radical and 7 from the bird radical.

Big5+

The Chinese Foundation for Digitization Technology (中文數位化技術推廣委員會) introduced Big5+ in 1997, which used over 20000 code points to incorporate all CJK logograms in Unicode 1.1. However, the extra code points exceeded the original Big-5 definition (Big5+ uses high byte values 81-FE and low byte values 40-7E and 80-FE), preventing it from being installed on Microsoft Windows without new codepage files.

Big-5E

To allow Windows users to use custom fonts, the Chinese Foundation for Digitization Technology introduced Big-5E, which added 3954 characters (in three blocks of code points: 8E40-A0FE, 8140-86DF, 86E0-875C) and removed the Japanese kana from the ETEN extension. Unlike Big-5+, Big5E extends Big-5 within its original definition. Mac OS X 10.3 and later supports Big-5E in the fonts LiHei Pro (儷黑 Pro.ttf) and LiSong Pro (儷宋 Pro.ttf).

Big5-2003

The Chinese Foundation for Digitization Technology made a Big5 definition and put it into CNS 11643 in note form, making it part of the official standard in Taiwan.

Big5-2003 incorporates all Big-5 characters introduced in the 1984 ETEN extensions (code points A3C0-A3E0, C6A1-C7F2, and F9D6-F9FE) and the Euro symbol. Cyrillic characters were not included because the authority claimed CNS 11643 does not include such characters.

CDP

The Academia Sinica made a Chinese Data Processing font (漢字構形資料庫) in late 1990s, which the latest release version 2.5 included 112,533 characters, some less than the Mojikyo fonts.

HKSCS

Hong Kong also adopted Big5 for character encoding. However, written Cantonese has its own characters not available in the normal Big5 character set. To solve this problem, the Hong Kong Government created the Big5 extensions Government Chinese Character Set (GCCS) in 1995 and Hong Kong Supplementary Character Set in 1999. The Hong Kong extensions were commonly distributed as a patch. It is still being distributed as a patch by Microsoft, but a full Unicode font is also available from the Hong Kong Government's web site.

There are two encoding schemes of HKSCS: one encoding scheme is for the Big-5 coding standard and the other is for the ISO 10646 standard. Subsequent to the initial release, there are also HKSCS-2001 and HKSCS-2004. The HKSCS-2004 is aligned technically with the ISO/IEC 10646:2003 and its Amendment 1 published in April 2004 by the International Organization for Standardization (ISO).

HKSCS includes all the characters from the common ETEN extension, plus some characters from simplified Chinese, place names, people's names, and Cantonese phrases (including profanity).

As of 2020, the most recent edition of HKSCS is HKSCS-2016; however, the last edition of HKSCS to encode all of its characters in Big5 was HKSCS-2008, while the characters added in more recent editions are mapped to ISO 10646 / Unicode only (as a CJK Unified Ideographs horizontal glyph extension where appropriate).[26] Additionally, similarly to Hong Kong's situation, there are also characters that are needed by Macao but is neither included in Big5 nor HKSCS, hence, the Macao Supplementary Character Set was developed, comprising characters not found in Big5 or HKSCS; this, however, is also not encoded in Big5. The first batch of 121 MSCS characters were submitted for inclusion in of mapping to Unicode in 2009,[27] and the first final version of MSCS was established in 2020.[26]

Kana and Cyrillic

There are two major Big5 extension layouts for encoding kana, Russian Cyrillic and list markers in the range 0xC6A1 through 0xC875. These are not compatible with one another.[28] They are compared in the table below.

The ETEN layout of kana and Cyrillic is also used by the HKSCS[29] (including HTML5)[30] and Unicode-At-On[31] variants, as well as by IBM's version of code page 950,[32][33][34] and the ETEN layout of the kana (with Cyrillic omitted) is also used by the Big5-2003 variant.[35] The published mapping files for Windows-950 include neither, and this Big5 range is mapped to the Private Use Area by the Windows-950 implementation from International Components for Unicode.[36] The Python's built-in cp950 codec implementation is using the BIG5.TXT layout.[37] The classic Mac OS version includes neither layout.[3]

See also

References

  1. ^ "Big5 (Traditional Chinese) character code table".
  2. ^ "Character Sets". chinesemac.org. Retrieved 2021-08-31.
  3. ^ a b Apple, Inc (2005-04-04) [1996-06-31]. Map (external version) from Mac OS Chinese Traditional encoding to Unicode 3.0 and later. Unicode Consortium.
  4. ^ . Archived from the original on 2007-02-22. Retrieved 2006-09-27.
  5. ^ . Archived from the original on 2014-12-02.
  6. ^ . Archived from the original on 2016-03-27.
  7. ^ . Archived from the original on 2014-12-01.
  8. ^ "Lead byte A3: ibm-950_P110-1999". ICU Demonstration - Converter Explorer. International Components for Unicode.
  9. ^ a b c Zhu, HF.; Hu, DY.; Wang, ZG.; Kao, TC.; Chang, WCH.; Crispin, M. (1996). "Chinese Character Encoding for Internet Messages". Requests for Comments. IETF. doi:10.17487/rfc1922. RFC 1922.
  10. ^ "Lead byte C6: ibm-950_P110-1999". ICU Demonstration - Converter Explorer. International Components for Unicode.
  11. ^ "Lead byte C7: ibm-950_P110-1999". ICU Demonstration - Converter Explorer. International Components for Unicode.
  12. ^ "Lead byte C8: ibm-950_P110-1999". ICU Demonstration - Converter Explorer. International Components for Unicode.
  13. ^ "Lead byte F9: ibm-950_P110-1999". ICU Demonstration - Converter Explorer. International Components for Unicode.
  14. ^ "IBM Traditional Chinese Graphic Character Set for IBM BIG-5 Code" (PDF). IBM. 1999. C-H 3-3220-131 1999-04.
  15. ^ a b . Archived from the original on 2016-03-27.
  16. ^ . Archived from the original on 2014-11-29.
  17. ^ . Archived from the original on 2016-03-27.
  18. ^ . IBM Globalization - Code page identifiers. Archived from the original on 2016-03-17.
  19. ^ "ibm-1373_P100-2002". ICU Demonstration - Converter Explorer. International Components for Unicode.
  20. ^ . IBM Globalization - Coded character set identifiers. IBM. Archived from the original on 2014-11-29.
  21. ^ International Components for Unicode (ICU), ibm-5471_P100-2006.ucm, 2007-05-09
  22. ^ . IBM Globalization - Coded character set identifiers. IBM. Archived from the original on 2014-11-29.
  23. ^ . IBM Globalization - Coded character set identifiers. IBM. Archived from the original on 2014-11-29.
  24. ^ . IBM Globalization - Coded character set identifiers. IBM. Archived from the original on 2014-11-29.
  25. ^ 黃國書. . ISU FTP. Archived from the original on 2005-03-19. Retrieved 2016-12-05.
  26. ^ a b Macao Special Administrative Region Government (2020-06-11). "Submission of Macao's Vertical Extension (UNC Characters), Horizontal Extension, and IVSes Registration for MSCS" (PDF). ISO/IEC JTC 1/SC 2/WG 2 IRGN 2430.
  27. ^ Computer Chinese Characters Encoding Workgroup (2009-06-12). (PDF). ISO/IEC JTC 1/SC 2/WG 2 IRGN 1580. Archived from the original (PDF) on 2015-01-04.
  28. ^ Lunde, Ken (1996-07-12). "2.3.1: BIG FIVE". CJK.INF Version 2.1.
  29. ^ "Big5HKSCS-2004". Mozilla Taiwan.
  30. ^ van Kesteren, Anne. "big5". Encoding Standard. WHATWG.
  31. ^ "UAO 2.41 b2u". Mozilla Taiwan.
  32. ^ "Lead byte C6: ibm-950_P110-1999". ICU Demonstration - Converter Explorer. International Components for Unicode.
  33. ^ "Lead byte C7: ibm-950_P110-1999". ICU Demonstration - Converter Explorer. International Components for Unicode.
  34. ^ "Lead byte C8: ibm-950_P110-1999". ICU Demonstration - Converter Explorer. International Components for Unicode.
  35. ^ "Big5-2003 b2u". Mozilla Taiwan.
  36. ^ IBM; Unicode Consortium (2002-12-03). "windows-950-2000". International Components for Unicode.
  37. ^ Script showing output of cp950 codec for lead bytes 0xC6 and 0xC7
  38. ^ Unicode Consortium (2015-12-02) [1994-02-11]. BIG5 to Unicode table (complete).
  39. ^ "Big5-ETen vs Unicode mapping table". Mozilla Taiwan. 2002-02-24.
  • Lunde, Ken (1999). CJKV Information Processing (First ed.). O'Reilly and Associates, Inc. ISBN 978-1-56592-224-2.

External links

  • Mozilla and the Big5 Family of Encodings: an overview of Big5 encodings with code charts for each extension and relevant Firefox bugs (Traditional Chinese)
  • Big5 character code table
  • by Christian Wittern
  • CNS 11643 official web site has information about the Big5e character set (an extended version of Big5) in the "Chinese Information Code" section.
  • Big5 introduction Contains differences between extensions.
  • Graphical View of Big5 in ICU's Converter Explorer
  • 教育部標準字體 Download page of the Taiwan Ministry of Education fonts
  • 文獻處理實驗室 Download pages of the CDP font
  • Hong Kong Supplementary Character Set Info Downloadable HKSCS documents & font
  • 香港參考宋體 Download page of Dynalab(華康科技有限公司)'s HKSCS font.
  • Microsoft's Windows Codepage 950 (Traditional Chinese Big5)
  • on.cc Download page of the OPG font
  • 中國海字集視窗版(v3.0)下載網頁 Download page of the ChinaSea font

big5, other, uses, five, disambiguation, this, article, multiple, issues, please, help, improve, discuss, these, issues, talk, page, learn, when, remove, these, template, messages, this, article, needs, additional, citations, verification, please, help, improv. For other uses see Big Five disambiguation This article has multiple issues Please help improve it or discuss these issues on the talk page Learn how and when to remove these template messages This article needs additional citations for verification Please help improve this article by adding citations to reliable sources Unsourced material may be challenged and removed Find sources Big5 news newspapers books scholar JSTOR January 2021 Learn how and when to remove this template message This article s tone or style may not reflect the encyclopedic tone used on Wikipedia See Wikipedia s guide to writing better articles for suggestions June 2013 Learn how and when to remove this template message Learn how and when to remove this template message Big 5 or Big5 is a Chinese character encoding method used in Taiwan Hong Kong and Macau for traditional Chinese characters Big5MIME IANABig5Alias es Big 5 大五碼Language s Traditional Chinese EnglishPartial support Simplified Chinese Greek Japanese Russian Bulgarian some of IPA letters for phonetic usage 1 Created byInstitute for Information IndustryClassificationExtended ASCII a b variable width encoding DBCS CJK encodingExtendsASCII b ExtensionsWindows 950 Big5 HKSCS numerous othersOther related encoding s CNS 11643 Not in the strictest sense of the term as ASCII bytes can appear as trail bytes a b Big5 does not specify a single byte component however ASCII or an extension is used in practice vteThe People s Republic of China PRC which uses simplified Chinese characters uses the GB 18030 character set instead Big5 gets its name from the consortium of five companies in Taiwan that developed it 2 Contents 1 Organization 1 1 A more detailed look at the organization 1 2 What a Big5 code actually encodes 1 3 The Matching SBCS 2 History 3 Extensions 3 1 Vendor extensions 3 1 1 ETen extensions 3 1 2 Microsoft code pages 3 1 3 IBM code pages 3 1 4 ChinaSea font 3 1 5 Sakura font 3 1 6 Unicode at on 3 1 7 OPG 3 2 Official extensions 3 2 1 Taiwan Ministry of Education font 3 2 2 Taiwan Council of Agriculture font 3 2 3 Big5 3 2 4 Big 5E 3 2 5 Big5 2003 3 2 6 CDP 3 2 7 HKSCS 4 Kana and Cyrillic 5 See also 6 References 7 External linksOrganization EditThe original Big5 character set is sorted first by usage frequency second by stroke count lastly by Kangxi radical The original Big5 character set lacked many commonly used characters To solve this problem each vendor developed its own extension The ETen extension became part of the current Big5 standard through popularity The structure of Big5 does not conform to the ISO 2022 standard but rather bears a certain similarity to the Shift JIS encoding It is a double byte character set DBCS with the following structure First byte lead byte 0x81 to 0xfe or 0xa1 to 0xf9 for non user defined characters Second byte 0x40 to 0x7e 0xa1 to 0xfe the prefix 0x signifying hexadecimal numbers Standard assignments excluding vendor or user defined extensions do not use the bytes 0x7F through 0xA0 nor 0xFF as either lead first or trail second bytes Bytes 0xA1 through 0xFE are used for both lead and trail bytes for double byte Big5 codes Bytes 0x40 through 0x7E are used as trail bytes following a lead byte or for single byte codes otherwise If the second byte is not in either range behavior is unspecified i e varies from system to system Additionally certain variants of the Big5 character set for example the HKSCS use an expanded range for the lead byte including values in the 0x81 to 0xA0 range similar to Shift JIS whereas others use reduced lead byte ranges for instance the Apple Macintosh variant uses 0xFD through 0xFF as single byte codes limiting the lead byte range to 0xA1 through 0xFC 3 The numerical value of individual Big5 codes are frequently given as a 4 digit hexadecimal number which describes the two bytes that comprise the Big5 code as if the two bytes were a big endian representation of a 16 bit number For example the Big5 code for a full width space which are the bytes 0xa1 0x40 is usually written as 0xa140 or just A140 Strictly speaking the Big5 encoding contains only DBCS characters However in practice the Big5 codes are always used together with an unspecified system dependent single byte character set ASCII or an 8 bit character set such as code page 437 so that you will find a mix of DBCS characters and single byte characters in Big5 encoded text Bytes in the range 0x00 to 0x7f that are not part of a double byte character are assumed to be single byte characters For a more detailed description of this problem please see the discussion on The Matching SBCS below The meaning of non ASCII single bytes outside the permitted values that are not part of a double byte character varies from system to system In old MSDOS based systems they are likely to be displayed as 8 bit characters in modern systems they are likely to either give unpredictable results or generate an error A more detailed look at the organization Edit In the original Big5 the encoding is compartmentalized into different zones 0x8140 to 0xA0FE Reserved for user defined characters 造字0xA140 to 0xA3BF Graphical characters 圖形碼0xA3C0 to 0xA3FE Reserved not for user defined characters0xA440 to 0xC67E Frequently used characters 常用字0xC6A1 to 0xC8FE Reserved for user defined characters0xC940 to 0xF9D5 Less frequently used characters 次常用字0xF9D6 to 0xFEFE Reserved for user defined charactersThe graphical characters actually comprise punctuation marks partial punctuation marks e g half of a dash half of an ellipsis see below dingbats foreign characters and other special characters e g presentational full width forms digits for Suzhou numerals zhuyin fuhao etc In most vendor extensions extended characters are placed in the various zones reserved for user defined characters each of which are normally regarded as associated with the preceding zone For example additional graphical characters e g punctuation marks would be expected to be placed in the 0xa3c0 0xa3fe range and additional logograms would be placed in either the 0xc6a1 0xc8fe or the 0xf9d6 0xfefe range Sometimes this is not possible due to the large number of extended characters to be added for example Cyrillic letters and Japanese kana have been placed in the zone associated with frequently used characters What a Big5 code actually encodes Edit An individual Big5 code does not always represent a complete semantic unit The Big5 codes of logograms are always logograms but codes in the graphical characters section are not always complete graphical characters What Big5 encodes are particular graphical representations of characters or part of characters that happen to fit in the space taken by two monospaced ASCII characters This is a property of double byte character sets as normally used in CJK Chinese Japanese and Korean computing and is not a unique problem of Big5 The above might need some explanation by putting it in historical perspective as it is theoretically incorrect Back when text mode personal computing was still the norm characters were normally represented as single bytes and each character takes one position on the screen There was therefore a practical reason to insist that double byte characters must take up two positions on the screen namely that off the shelf American made software would then be usable without modification in a DBCS based system If a character can take an arbitrary number of screen positions software that assumes that one byte of text takes one screen position would produce incorrect output Of course if a computer never had to deal with the text screen the manufacturer would not enforce this artificial restriction the Apple Macintosh is an example Nevertheless the encoding itself must be designed so that it works correctly on text screen based systems To illustrate this point consider the Big5 code 0xa14b To English speakers this looks like an ellipsis and the Unicode standard identifies it as such however in Chinese the ellipsis consists of six dots that fit in the space of two Chinese characters so in fact there is no Big5 code for the Chinese ellipsis and the Big5 code 0xa14b just represents half of a Chinese ellipsis It represents only half of an ellipsis because the whole ellipsis should take the space of two Chinese characters and in many DBCS systems one DBCS character must take exactly the space of one Chinese character Characters encoded in Big5 do not always represent things that can be readily used in plain text files an example is citation mark 0xa1ca which is when used required to be typeset under the title of literary works Another example is the Suzhou numerals which is a form of scientific notation that requires the number to be laid out in a 2 D form consisting of at least two rows The Matching SBCS Edit In practice Big5 cannot be used without a matching Single Byte Character Set SBCS this is mostly to do with a compatibility reason However as in the case of other CJK DBCS character sets the SBCS to use has never been specified Big5 has always been defined as a DBCS though when used it must be paired with a suitable unspecified SBCS and therefore used as what some people call a MBCS nevertheless Big5 by itself as defined is strictly a DBCS The SBCS to use being unspecified implies that the SBCS used can theoretically vary from system to system Nowadays ASCII is the only possible SBCS one would use However in old DOS based systems Code Page 437 with its extra special symbols in the control code area including position 127 was much more common Yet on a Macintosh system with the Chinese Language Kit or on a Unix system running the cxterm terminal emulator the SBCS paired with Big5 would not be Code Page 437 Outside the valid range of Big5 the old DOS based systems would routinely interpret things according to the SBCS that is paired with Big5 on that system In such systems characters 127 to 160 for example were very likely not avoided because they would produce invalid Big5 but used because they would be valid characters in Code Page 437 The modern characterization of Big5 as an MBCS consisting of the DBCS of Big5 plus the SBCS of ASCII is therefore historically incorrect and potentially flawed as the choice of the matching SBCS was and theoretically still is quite independent of the flavour of Big5 being used History EditThe inability of ASCII to support large character sets such as used for Chinese Japanese and Korean led to governments and industry to find creative solutions to enable their languages to be rendered on computers A variety of ad hoc and usually proprietary input methods led to efforts to develop a standard system As a result Big5 encoding was defined by the Institute for Information Industry of Taiwan in 1984 The name Big5 is in recognition that the standard emerged from collaboration of five of Taiwan s largest IT firms Acer 宏碁 MiTAC 神通 JiaJia 佳佳 ZERO ONE Technology 零壹 or 01tech and First International Computer FIC 大眾 Big5 was rapidly popularized in Taiwan and worldwide among Chinese who used the traditional Chinese character set through its adoption in several commercial software packages notably the E TEN Chinese DOS input system ETen Chinese System The Republic of China government declared Big5 as their standard in mid 1980s since it was by then the de facto standard for using traditional Chinese on computers Extensions EditThe original Big 5 only include CJK logograms from the Charts of Standard Forms of Common National Characters 4808 characters and Less Than Common National Characters 6343 characters but not letters from people s names place names dialects chemistry biology Japanese kana As a result many Big 5 supporting software include extensions to address the problems The plethora of variations make UTF 8 or UTF 16 a more consistent code page for modern use Vendor extensions Edit ETen extensions Edit In ETen 倚天 Chinese operating system the following code points are added to add support for some characters present in the IBM 5550 s code page but absent from generic Big5 A3C0 A3E0 33 control characters C6A1 C875 circle 1 10 bracket 1 10 Roman numerals 1 9 i ix CJK radical glyphs Japanese hiragana Japanese katakana Cyrillic characters F9D6 F9FE the characters 碁 銹 恒 裏 墻 粧 and 嫺 followed by 34 additional semigraphic symbols In some versions of ETen there are extra graphical symbols and simplified Chinese characters Microsoft code pages Edit Main article code page 950 Microsoft 微軟 created its own version of Big5 extension as Code page 950 for use with Microsoft Windows which supports the F9D6 F9FE code points from ETEN s extensions In some versions of Windows the euro currency symbol is mapped to Big 5 code point A3E1 After installing Microsoft s HKSCS patch on top of traditional Chinese Windows or any version of Windows 2000 and above with proper language pack applications using code page 950 automatically use a hidden code page 951 table The table supports all code points in HKSCS 2001 except for the compatibility code points specified by the standard 4 IBM code pages Edit In contrast to Microsoft s code page 950 IBM s CCSID 950 comprises single byte code page 1114 CCSID 1114 and double byte code page 947 CCSID 947 5 6 7 It incorporates ETEN extensions for lead bytes 0xA3 8 0xC6 9 10 0xC7 11 and 0xC8 9 12 while omitting those with lead byte 0xF9 which Microsoft includes mapping them instead to the Private Use Area as user defined characters 9 13 It also includes two non ETEN extension regions with trail bytes 0x81 A0 i e outside the usual Big5 trail byte range but similar to the Big5 trail byte range area 5 has lead bytes 0xF2 F9 and contains IBM selected characters while area 9 has lead bytes 0x81 8C and is a user defined region 14 IBM refers to the euro sign update of their Big 5 variant as CCSID 1370 which includes both single byte 0x80 and double byte 0xA3E1 euro signs 15 It comprises single byte code page 1114 CCSID 5210 and double byte code page 947 CCSID 21427 15 16 17 For better compatibility with Microsoft s variant in IBM Db2 IBM also define the pure double byte Code page 1372 18 and the associated variable width CCSID 1373 which corresponds to Microsoft s code page 950 19 IBM assigns CCSID 5471 to the HKSCS 2001 Big5 code page with CPGID 1374 as CCSID 5470 as the double byte component 20 21 CCSID 9567 to the HKSCS 2004 code page with CPGID 1374 as CCSID 9566 as the double byte component 22 and CCSID 13663 to the HKSCS 2008 code page with CPGID 1374 as CCSID 13662 as the double byte component 23 while CCSID 1375 is assigned to a growing HKSCS code page currently equivalent to CCSID 13663 24 ChinaSea font Edit ChinaSea fonts 中國海字集 25 are Traditional Chinese fonts made by ChinaSea The fonts are rarely sold separately but are bundled with other products such as the Chinese version of Microsoft Office 97 The fonts support Japanese kana kokuji and other characters missing in Big 5 As a result the ChinaSea extensions have become more popular than the government supported extensions as of Some Hong Kong BBSes had used encodings in ChinaSea fonts before the introduction of HKSCS Sakura font Edit The Sakura font 日和字集 Sakura Version is developed in Hong Kong and is designed to be compatible with HKSCS It adds support for kokuji and proprietary dingbats including Doraemon not found in HKSCS Unicode at on Edit Unicode at on Unicode補完計畫 formerly BIG5 extension extends BIG 5 by altering code page tables but uses the ChinaSea extensions starting with version 2 However with the bankruptcy of ChinaSea late development and the increasing popularity of HKSCS and Unicode the project is not compatible with HKSCS the success of this extension is limited at best Despite the problems characters previously mapped to Unicode Private Use Area are remapped to the standardized equivalents when exporting characters to Unicode format OPG Edit The web sites of the Oriental Daily News and Sun Daily belonging to the Oriental Press Group Limited 東方報業集團有限公司 in Hong Kong used a downloadable font with a different Big 5 extension coding than the HKSCS Official extensions Edit Taiwan Ministry of Education font Edit The Taiwan Ministry of Education supplied its own font the Taiwan Ministry of Education font 臺灣教育部造字檔 for use internally Taiwan Council of Agriculture font Edit Taiwan s Council of Agriculture font Executive Yuan introduced a 133 character custom font the Taiwan Council of Agriculture font 臺灣農委會常用中文外字集 that includes 84 characters from the fish radical and 7 from the bird radical Big5 Edit The Chinese Foundation for Digitization Technology 中文數位化技術推廣委員會 introduced Big5 in 1997 which used over 20000 code points to incorporate all CJK logograms in Unicode 1 1 However the extra code points exceeded the original Big 5 definition Big5 uses high byte values 81 FE and low byte values 40 7E and 80 FE preventing it from being installed on Microsoft Windows without new codepage files Big 5E Edit To allow Windows users to use custom fonts the Chinese Foundation for Digitization Technology introduced Big 5E which added 3954 characters in three blocks of code points 8E40 A0FE 8140 86DF 86E0 875C and removed the Japanese kana from the ETEN extension Unlike Big 5 Big5E extends Big 5 within its original definition Mac OS X 10 3 and later supports Big 5E in the fonts LiHei Pro 儷黑 Pro ttf and LiSong Pro 儷宋 Pro ttf Big5 2003 Edit The Chinese Foundation for Digitization Technology made a Big5 definition and put it into CNS 11643 in note form making it part of the official standard in Taiwan Big5 2003 incorporates all Big 5 characters introduced in the 1984 ETEN extensions code points A3C0 A3E0 C6A1 C7F2 and F9D6 F9FE and the Euro symbol Cyrillic characters were not included because the authority claimed CNS 11643 does not include such characters CDP Edit The Academia Sinica made a Chinese Data Processing font 漢字構形資料庫 in late 1990s which the latest release version 2 5 included 112 533 characters some less than the Mojikyo fonts HKSCS Edit Main article Hong Kong Supplementary Character Set Hong Kong also adopted Big5 for character encoding However written Cantonese has its own characters not available in the normal Big5 character set To solve this problem the Hong Kong Government created the Big5 extensions Government Chinese Character Set GCCS in 1995 and Hong Kong Supplementary Character Set in 1999 The Hong Kong extensions were commonly distributed as a patch It is still being distributed as a patch by Microsoft but a full Unicode font is also available from the Hong Kong Government s web site There are two encoding schemes of HKSCS one encoding scheme is for the Big 5 coding standard and the other is for the ISO 10646 standard Subsequent to the initial release there are also HKSCS 2001 and HKSCS 2004 The HKSCS 2004 is aligned technically with the ISO IEC 10646 2003 and its Amendment 1 published in April 2004 by the International Organization for Standardization ISO HKSCS includes all the characters from the common ETEN extension plus some characters from simplified Chinese place names people s names and Cantonese phrases including profanity As of 2020 update the most recent edition of HKSCS is HKSCS 2016 however the last edition of HKSCS to encode all of its characters in Big5 was HKSCS 2008 while the characters added in more recent editions are mapped to ISO 10646 Unicode only as a CJK Unified Ideographs horizontal glyph extension where appropriate 26 Additionally similarly to Hong Kong s situation there are also characters that are needed by Macao but is neither included in Big5 nor HKSCS hence the Macao Supplementary Character Set was developed comprising characters not found in Big5 or HKSCS this however is also not encoded in Big5 The first batch of 121 MSCS characters were submitted for inclusion in of mapping to Unicode in 2009 27 and the first final version of MSCS was established in 2020 26 Kana and Cyrillic EditThere are two major Big5 extension layouts for encoding kana Russian Cyrillic and list markers in the range 0xC6A1 through 0xC875 These are not compatible with one another 28 They are compared in the table below The ETEN layout of kana and Cyrillic is also used by the HKSCS 29 including HTML5 30 and Unicode At On 31 variants as well as by IBM s version of code page 950 32 33 34 and the ETEN layout of the kana with Cyrillic omitted is also used by the Big5 2003 variant 35 The published mapping files for Windows 950 include neither and this Big5 range is mapped to the Private Use Area by the Windows 950 implementation from International Components for Unicode 36 The Python s built in cp950 codec implementation is using the BIG5 TXT layout 37 The classic Mac OS version includes neither layout 3 Big5 codes 0xC6A1 through 0xC875Big5 code BIG5 TXT layout 38 ETEN layout 39 0xC6A1 ヾ 0xC6A2 ゝ 0xC6A3 ゞ 0xC6A4 々 0xC6A5 ぁ 0xC6A6 あ 0xC6A7 ぃ 0xC6A8 い 0xC6A9 ぅ 0xC6AA う 0xC6AB ぇ 0xC6AC え 0xC6AD ぉ 0xC6AE お 0xC6AF か 0xC6B0 が 0xC6B1 き 0xC6B2 ぎ 0xC6B3 く 0xC6B4 ぐ 0xC6B5 け 0xC6B6 げ 0xC6B7 こ 0xC6B8 ご 0xC6B9 さ 0xC6BA ざ 0xC6BB し 0xC6BC じ 0xC6BD す 0xC6BE ず 0xC6BF せ 丶0xC6C0 ぜ 丿0xC6C1 そ 亅0xC6C2 ぞ 亠0xC6C3 た 冂0xC6C4 だ 冖0xC6C5 ち 冫0xC6C6 ぢ 勹0xC6C7 っ 匸0xC6C8 つ 卩0xC6C9 づ 厶0xC6CA て 夊0xC6CB で 宀0xC6CC と 巛0xC6CD ど 0xC6CE な 广0xC6CF に 廴0xC6D0 ぬ 彐0xC6D1 ね 彡0xC6D2 の 攴0xC6D3 は 无0xC6D4 ば 疒0xC6D5 ぱ 癶0xC6D6 ひ 辵0xC6D7 び 隶0xC6D8 ぴ 0xC6D9 ふ ˆ0xC6DA ぶ ヽ0xC6DB ぷ ヾ0xC6DC へ ゝ0xC6DD べ ゞ0xC6DE ぺ 0xC6DF ほ 仝0xC6E0 ぼ 々0xC6E1 ぽ 〆0xC6E2 ま 0xC6E3 み ー0xC6E4 む 0xC6E5 め 0xC6E6 も 0xC6E7 ゃ ぁ0xC6E8 や あ0xC6E9 ゅ ぃ0xC6EA ゆ い0xC6EB ょ ぅ0xC6EC よ う0xC6ED ら ぇ0xC6EE り え0xC6EF る ぉ0xC6F0 れ お0xC6F1 ろ か0xC6F2 ゎ が0xC6F3 わ き0xC6F4 ゐ ぎ0xC6F5 ゑ く0xC6F6 を ぐ0xC6F7 ん け0xC6F8 ァ げ0xC6F9 ア こ0xC6FA ィ ご0xC6FB イ さ0xC6FC ゥ ざ0xC6FD ウ し0xC6FE ェ じ0xC740 エ す0xC741 ォ ず0xC742 オ せ0xC743 カ ぜ0xC744 ガ そ0xC745 キ ぞ0xC746 ギ た0xC747 ク だ0xC748 グ ち0xC749 ケ ぢ0xC74A ゲ っ0xC74B コ つ0xC74C ゴ づ0xC74D サ て0xC74E ザ で0xC74F シ と0xC750 ジ ど0xC751 ス な0xC752 ズ に0xC753 セ ぬ0xC754 ゼ ね0xC755 ソ の0xC756 ゾ は0xC757 タ ば0xC758 ダ ぱ0xC759 チ ひ0xC75A ヂ び0xC75B ッ ぴ0xC75C ツ ふ0xC75D ヅ ぶ0xC75E テ ぷ0xC75F デ へ0xC760 ト べ0xC761 ド ぺ0xC762 ナ ほ0xC763 ニ ぼ0xC764 ヌ ぽ0xC765 ネ ま0xC766 ノ み0xC767 ハ む0xC768 バ め0xC769 パ も0xC76A ヒ ゃ0xC76B ビ や0xC76C ピ ゅ0xC76D フ ゆ0xC76E ブ ょ0xC76F プ よ0xC770 ヘ ら0xC771 ベ り0xC772 ペ る0xC773 ホ れ0xC774 ボ ろ0xC775 ポ ゎ0xC776 マ わ0xC777 ミ ゐ0xC778 ム ゑ0xC779 メ を0xC77A モ ん0xC77B ャ ァ0xC77C ヤ ア0xC77D ュ ィ0xC77E ユ イ0xC7A1 ョ ゥ0xC7A2 ヨ ウ0xC7A3 ラ ェ0xC7A4 リ エ0xC7A5 ル ォ0xC7A6 レ オ0xC7A7 ロ カ0xC7A8 ヮ ガ0xC7A9 ワ キ0xC7AA ヰ ギ0xC7AB ヱ ク0xC7AC ヲ グ0xC7AD ン ケ0xC7AE ヴ ゲ0xC7AF ヵ コ0xC7B0 ヶ ゴ0xC7B1 D サ0xC7B2 E ザ0xC7B3 Yo シ0xC7B4 Zh ジ0xC7B5 Z ス0xC7B6 I ズ0xC7B7 J セ0xC7B8 K ゼ0xC7B9 L ソ0xC7BA M ゾ0xC7BB U タ0xC7BC F ダ0xC7BD H チ0xC7BE C ヂ0xC7BF Ch ッ0xC7C0 Sh ツ0xC7C1 Sh ヅ0xC7C2 テ0xC7C3 Y デ0xC7C4 ト0xC7C5 E ド0xC7C6 Yu ナ0xC7C7 Ya ニ0xC7C8 a ヌ0xC7C9 b ネ0xC7CA v ノ0xC7CB g ハ0xC7CC d バ0xC7CD e パ0xC7CE yo ヒ0xC7CF zh ビ0xC7D0 z ピ0xC7D1 i フ0xC7D2 j ブ0xC7D3 k プ0xC7D4 l ヘ0xC7D5 m ベ0xC7D6 n ペ0xC7D7 o ホ0xC7D8 p ボ0xC7D9 r ポ0xC7DA s マ0xC7DB t ミ0xC7DC u ム0xC7DD f メ0xC7DE h モ0xC7DF c ャ0xC7E0 ch ヤ0xC7E1 sh ュ0xC7E2 sh ユ0xC7E3 ョ0xC7E4 y ヨ0xC7E5 ラ0xC7E6 e リ0xC7E7 yu ル0xC7E8 ya レ0xC7E9 ロ0xC7EA ヮ0xC7EB ワ0xC7EC ヰ0xC7ED ヱ0xC7EE ヲ0xC7EF ン0xC7F0 ヴ0xC7F1 ヵ0xC7F2 ヶ0xC7F3 A0xC7F4 B0xC7F5 V0xC7F6 G0xC7F7 D0xC7F8 E0xC7F9 Yo0xC7FA Zh0xC7FB Z0xC7FC I0xC7FD not used J0xC7FE not used K0xC840 not used L0xC841 not used M0xC842 not used N0xC843 not used O0xC844 not used P0xC845 not used R0xC846 not used S0xC847 not used T0xC848 not used U0xC849 not used F0xC84A not used H0xC84B not used C0xC84C not used Ch0xC84D not used Sh0xC84E not used Sh0xC84F not used 0xC850 not used Y0xC851 not used 0xC852 not used E0xC853 not used Yu0xC854 not used Ya0xC855 not used a0xC856 not used b0xC857 not used v0xC858 not used g0xC859 not used d0xC85A not used e0xC85B not used yo0xC85C not used zh0xC85D not used z0xC85E not used i0xC85F not used j0xC860 not used k0xC861 not used l0xC862 not used m0xC863 not used n0xC864 not used o0xC865 not used p0xC866 not used r0xC867 not used s0xC868 not used t0xC869 not used u0xC86A not used f0xC86B not used h0xC86C not used c0xC86D not used ch0xC86E not used sh0xC86F not used sh0xC870 not used 0xC871 not used y0xC872 not used 0xC873 not used e0xC874 not used yu0xC875 not used yaSee also EditUnicode Han unification Chinese input methods for computersReferences Edit Big5 Traditional Chinese character code table Character Sets chinesemac org Retrieved 2021 08 31 a b Apple Inc 2005 04 04 1996 06 31 Map external version from Mac OS Chinese Traditional encoding to Unicode 3 0 and later Unicode Consortium 狗爺語錄 Blog Archive What is Code Page 951 CP951 Archived from the original on 2007 02 22 Retrieved 2006 09 27 CCSID 950 information document Archived from the original on 2014 12 02 CCSID 1114 information document Archived from the original on 2016 03 27 CCSID 947 information document Archived from the original on 2014 12 01 Lead byte A3 ibm 950 P110 1999 ICU Demonstration Converter Explorer International Components for Unicode a b c Zhu HF Hu DY Wang ZG Kao TC Chang WCH Crispin M 1996 Chinese Character Encoding for Internet Messages Requests for Comments IETF doi 10 17487 rfc1922 RFC 1922 Lead byte C6 ibm 950 P110 1999 ICU Demonstration Converter Explorer International Components for Unicode Lead byte C7 ibm 950 P110 1999 ICU Demonstration Converter Explorer International Components for Unicode Lead byte C8 ibm 950 P110 1999 ICU Demonstration Converter Explorer International Components for Unicode Lead byte F9 ibm 950 P110 1999 ICU Demonstration Converter Explorer International Components for Unicode IBM Traditional Chinese Graphic Character Set for IBM BIG 5 Code PDF IBM 1999 C H 3 3220 131 1999 04 a b CCSID 1370 information document Archived from the original on 2016 03 27 CCSID 5210 information document Archived from the original on 2014 11 29 CCSID 21427 information document Archived from the original on 2016 03 27 CPGID 01372 MS T Chinese Big 5 Special for DB2 IBM Globalization Code page identifiers Archived from the original on 2016 03 17 ibm 1373 P100 2002 ICU Demonstration Converter Explorer International Components for Unicode CCSID 5471 Mixed Big 5 ext for HKSCS 2001 IBM Globalization Coded character set identifiers IBM Archived from the original on 2014 11 29 International Components for Unicode ICU ibm 5471 P100 2006 ucm 2007 05 09 CCSID 9567 Mixed Big 5 ext for HKSCS 2004 IBM Globalization Coded character set identifiers IBM Archived from the original on 2014 11 29 CCSID 13663 Mixed Big 5 ext for HKSCS 2008 IBM Globalization Coded character set identifiers IBM Archived from the original on 2014 11 29 CCSID 1375 Mixed Big 5 ext for HKSCS IBM Globalization Coded character set identifiers IBM Archived from the original on 2014 11 29 黃國書 Chinasea 1 0 中國海字集 ISU FTP Archived from the original on 2005 03 19 Retrieved 2016 12 05 a b Macao Special Administrative Region Government 2020 06 11 Submission of Macao s Vertical Extension UNC Characters Horizontal Extension and IVSes Registration for MSCS PDF ISO IEC JTC 1 SC 2 WG 2 IRGN 2430 Computer Chinese Characters Encoding Workgroup 2009 06 12 Submission of Characters from Macao Information Systems Character Set PDF ISO IEC JTC 1 SC 2 WG 2 IRGN 1580 Archived from the original PDF on 2015 01 04 Lunde Ken 1996 07 12 2 3 1 BIG FIVE CJK INF Version 2 1 Big5HKSCS 2004 Mozilla Taiwan van Kesteren Anne big5 Encoding Standard WHATWG UAO 2 41 b2u Mozilla Taiwan Lead byte C6 ibm 950 P110 1999 ICU Demonstration Converter Explorer International Components for Unicode Lead byte C7 ibm 950 P110 1999 ICU Demonstration Converter Explorer International Components for Unicode Lead byte C8 ibm 950 P110 1999 ICU Demonstration Converter Explorer International Components for Unicode Big5 2003 b2u Mozilla Taiwan IBM Unicode Consortium 2002 12 03 windows 950 2000 International Components for Unicode Script showing output of cp950 codec for lead bytes 0xC6 and 0xC7 Unicode Consortium 2015 12 02 1994 02 11 BIG5 to Unicode table complete Big5 ETen vs Unicode mapping table Mozilla Taiwan 2002 02 24 Lunde Ken 1999 CJKV Information Processing First ed O Reilly and Associates Inc ISBN 978 1 56592 224 2 External links EditMozilla and the Big5 Family of Encodings an overview of Big5 encodings with code charts for each extension and relevant Firefox bugs Traditional Chinese Big5 character code table Chinese character codes an update by Christian Wittern CNS 11643 official web site has information about the Big5e character set an extended version of Big5 in the Chinese Information Code section Big5 introduction Contains differences between extensions Graphical View of Big5 in ICU s Converter Explorer 教育部標準字體 Download page of the Taiwan Ministry of Education fonts 文獻處理實驗室 Download pages of the CDP font Hong Kong Supplementary Character Set Info Downloadable HKSCS documents amp font 香港參考宋體 Download page of Dynalab 華康科技有限公司 s HKSCS font Microsoft s Windows Codepage 950 Traditional Chinese Big5 on cc Download page of the OPG font 中國海字集視窗版 v3 0 下載網頁 Download page of the ChinaSea font Big5 Codeset Overview Retrieved from https en wikipedia org w index php title Big5 amp oldid 1126163649, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.