fbpx
Wikipedia

Code page 936 (Microsoft Windows)

Windows code page 936 (abbreviated MS936, Windows-936 or (ambiguously) CP936),[1] is Microsoft's legacy (pre-Unicode) character encoding for representing simplified Chinese text on computers. It is one of the four Windows DBCSs for East Asian languages, accompanying code pages 932 (Japanese), 949 (Korean) and 950 (Traditional Chinese). It is a variant of the Mainland Chinese Guójiā Biāozhǔn Kuòzhǎn (GBK) encoding, and roughly corresponds to IBM code page 1386 (CP1386 or IBM-1386).

Windows code page 936
MIME / IANAGBK
Language(s)Mainly used for Simplified Chinese, but also supports Traditional Chinese, Japanese, English, Russian and (partially) Greek.
ClassificationGBK variant, Extended ASCII,[a] variable-width encoding, CJK encoding
ExtendsEUC-CN
Based onGBK (GB 13000.1-93 annex)
Succeeded byCode page 54936 (GB 18030)
  1. ^ Not in the strictest sense of the term, as ASCII bytes can appear as trail bytes.

History edit

Originally, Windows-936 covered GB 2312 (in its EUC-CN form), but it was expanded to cover most of GBK with the release of Windows 95. The Euro sign (€), not defined in GBK, is encoded as 0x80 in Windows-936 and IBM-1386. On the other hand, 95 characters defined in GBK 1.0 were initially not encoded into Windows-936. This is partly resolved in later versions of Windows and, as in Windows 7, all GBK characters not in the Unicode BMP Private Use Area can be displayed using code page 936, but encoding the 95 characters was still not supported as of 2014.

Windows code page 936 was superseded by code page 54936 (GB 18030), but as of 2014 was still prevalent in use. The Windows console uses code page 936 as the default code page for simplified Chinese installations, although part of the GB 18030 was made mandatory for all software products sold in China. In 2002, the IANA Internet name GBK was registered with Windows-936's mapping,[2][3] making it the de facto GBK definition on the Internet.

Terminology edit

 
Windows code page 936 corresponds roughly to IBM code page 1386, and is a different encoding from the obsolete IBM code page 936.

The name "code page 936" is ambiguous. IBM's code page 936,[4], an obsolete IBM 5550 encoding, is also a Simplified Chinese encoding, but uses a different encoding method for GB 2312 (Shift GB), and so is entirely incompatible with Windows code page 936 (in contrast to IBM code page 932 being, to a first approximation,[a] a subset of Windows code page 932)—although International Components for Unicode does not include an IBM-936 codec, and uses the Windows code page for the cp936 label.[1] IBM's code page for GBK coverage is code page 1386, which is defined as a combination of the single byte Code page 1114 and the double byte Code page 1385.[5]

The concepts of "Windows-936", "GBK", "GB2312" and "EUC-CN" are sometimes conflated in various software products. EUC-CN is registered with the IANA as GB2312, although it is a specific, variable-width 8-bit stateless, encoding format of GB 2312 (which also has other, less widely used, encoding formats such as HZ-GB-2312, ISO-2022-CN or the aforementioned Shift GB).

Since GBK is a superset of EUC-CN (although not itself an EUC code) and superseded GB 2312 long ago, and since Microsoft software continued to assign the GB2312 encoding label to code page 936 even after extending it to implement GBK rather than EUC-CN, most modern-day Windows-based software products mean partial support for GBK via Windows-936, rather than EUC-CN or other encoding formats of GB 2312, when they use the term "GB 2312" as a character encoding option. This can be observed in products such as Microsoft Internet Explorer and Notepad++.

Footnotes edit

  1. ^ If the character variant swaps from 1983 are ignored.

References edit

  1. ^ a b "windows-936-2000 (alias cp936)". ICU Demonstration - Converter Explorer. International Components for Unicode.
  2. ^ "Character Sets". Retrieved 3 October 2016.
  3. ^ Application of IANA Charset Registration for GBK
  4. ^ . IBM Globalization. IBM. Archived from the original on 2014-12-01.
  5. ^ . IBM. Archived from the original on 2014-11-29.

External links edit

Windows-936:

  • Microsoft's reference for Windows-936
  • Code page file for Windows-936
  • Mapping of Windows-936 to Unicode
  • ICU demonstration of Windows-936
  • International Components for Unicode (ICU), windows-936-2000.ucm

IBM-1386:

  • ICU demonstration of IBM-1386
  • ICU mapping of IBM-1386 to Unicode

code, page, microsoft, windows, windows, code, page, abbreviated, ms936, windows, ambiguously, cp936, microsoft, legacy, unicode, character, encoding, representing, simplified, chinese, text, computers, four, windows, dbcss, east, asian, languages, accompanyin. Windows code page 936 abbreviated MS936 Windows 936 or ambiguously CP936 1 is Microsoft s legacy pre Unicode character encoding for representing simplified Chinese text on computers It is one of the four Windows DBCSs for East Asian languages accompanying code pages 932 Japanese 949 Korean and 950 Traditional Chinese It is a variant of the Mainland Chinese Guojia Biaozhǔn Kuozhǎn GBK encoding and roughly corresponds to IBM code page 1386 CP1386 or IBM 1386 Windows code page 936MIME IANAGBKLanguage s Mainly used for Simplified Chinese but also supports Traditional Chinese Japanese English Russian and partially Greek ClassificationGBK variant Extended ASCII a variable width encoding CJK encodingExtendsEUC CNBased onGBK GB 13000 1 93 annex Succeeded byCode page 54936 GB 18030 Not in the strictest sense of the term as ASCII bytes can appear as trail bytes vte Contents 1 History 2 Terminology 3 Footnotes 4 References 5 External linksHistory editOriginally Windows 936 covered GB 2312 in its EUC CN form but it was expanded to cover most of GBK with the release of Windows 95 The Euro sign not defined in GBK is encoded as 0x80 in Windows 936 and IBM 1386 On the other hand 95 characters defined in GBK 1 0 were initially not encoded into Windows 936 This is partly resolved in later versions of Windows and as in Windows 7 all GBK characters not in the Unicode BMP Private Use Area can be displayed using code page 936 but encoding the 95 characters was still not supported as of 2014 update Windows code page 936 was superseded by code page 54936 GB 18030 but as of 2014 update was still prevalent in use The Windows console uses code page 936 as the default code page for simplified Chinese installations although part of the GB 18030 was made mandatory for all software products sold in China In 2002 the IANA Internet name GBK was registered with Windows 936 s mapping 2 3 making it the de facto GBK definition on the Internet Terminology edit nbsp Windows code page 936 corresponds roughly to IBM code page 1386 and is a different encoding from the obsolete IBM code page 936 The name code page 936 is ambiguous IBM s code page 936 4 an obsolete IBM 5550 encoding is also a Simplified Chinese encoding but uses a different encoding method for GB 2312 Shift GB and so is entirely incompatible with Windows code page 936 in contrast to IBM code page 932 being to a first approximation a a subset of Windows code page 932 although International Components for Unicode does not include an IBM 936 codec and uses the Windows code page for the cp936 label 1 IBM s code page for GBK coverage is code page 1386 which is defined as a combination of the single byte Code page 1114 and the double byte Code page 1385 5 The concepts of Windows 936 GBK GB2312 and EUC CN are sometimes conflated in various software products EUC CN is registered with the IANA as GB2312 although it is a specific variable width 8 bit stateless encoding format of GB 2312 which also has other less widely used encoding formats such as HZ GB 2312 ISO 2022 CN or the aforementioned Shift GB Since GBK is a superset of EUC CN although not itself an EUC code and superseded GB 2312 long ago and since Microsoft software continued to assign the GB2312 encoding label to code page 936 even after extending it to implement GBK rather than EUC CN most modern day Windows based software products mean partial support for GBK via Windows 936 rather than EUC CN or other encoding formats of GB 2312 when they use the term GB 2312 as a character encoding option This can be observed in products such as Microsoft Internet Explorer and Notepad Footnotes edit If the character variant swaps from 1983 are ignored References edit a b windows 936 2000 alias cp936 ICU Demonstration Converter Explorer International Components for Unicode Character Sets Retrieved 3 October 2016 Application of IANA Charset Registration for GBK Coded character set identifiers CCSID 936 IBM Globalization IBM Archived from the original on 2014 12 01 Coded character set identifiers CCSID 1386 IBM Archived from the original on 2014 11 29 External links editWindows 936 Microsoft s reference for Windows 936 Code page file for Windows 936 Mapping of Windows 936 to Unicode ICU demonstration of Windows 936 International Components for Unicode ICU windows 936 2000 ucm IBM 1386 IBM s documentation for IBM 1386 ICU demonstration of IBM 1386 ICU mapping of IBM 1386 to Unicode Retrieved from https en wikipedia org w index php title Code page 936 Microsoft Windows amp oldid 1210961363, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.