fbpx
Wikipedia

Shift JIS

Shift JIS (Shift Japanese Industrial Standards, also SJIS, MIME name Shift_JIS, known as PCK in Solaris contexts)[2][3] is a character encoding for the Japanese language, originally developed by a Japanese company called ASCII Corporation in conjunction with Microsoft and standardized as JIS X 0208 Appendix 1. As of October 2022, 0.2% of all web pages used Shift JIS, a decline from 1.3% in July 2014.[4]

Shift JIS
MIME / IANAShift_JIS
Alias(es)MS_Kanji,[1] PCK[2][3]
Language(s)Primarily Japanese, but also supporting English, Russian, Bulgarian, Greek
StandardJIS X 0208:1997 Appendix 1
ClassificationExtended ISO 646,[a] variable-width encoding, CJK encoding
ExtendsJIS X 0201 8-bit format
Transforms / EncodesJIS X 0208
Succeeded byShift_JIS-2004 (JIS)
Windows-31J (web)
  1. ^ Not in the strictest sense of the term, as ASCII bytes can appear as trail bytes.

Shift JIS is the second-most popular character encoding for Japanese websites, used by 5.6% of sites in the .jp domain. UTF-8 is used by 94.4% of Japanese websites.[5][6]

Description

Shift JIS is based on character sets defined within JIS standards JIS X 0201:1997 (for the single-byte characters) and JIS X 0208:1997 (for the double-byte characters). The lead bytes for the double-byte characters are "shifted" around the 64 halfwidth katakana characters in the single-byte range 0xA1 to 0xDF. The single-byte characters 0x00 to 0x7F match the ASCII encoding, except for a yen sign (U+00A5) at 0x5C and an overline (U+203E) at 0x7E in place of the ASCII character set's backslash and tilde respectively. The single-byte characters from 0xA1 to 0xDF map to the half-width katakana characters found in JIS X 0201.

HTML written in Shift JIS can still be interpreted to some extent when incorrectly tagged as ASCII, and when the charset tag is in the top of the document itself, since the important start and end of HTML tags and fields, <, >, /, ", &, ; are coded by the same single bytes as in ASCII, and those bytes won't appear in two-byte sequences. Shift JIS is possible to use in string literals in programming languages such as C, but a few things must be taken into consideration. Firstly, that the escape character 0x5C, normally backslash, is the half-width yen sign (¥) in Shift JIS. If the programmer is aware of this, it would be possible to use printf("ハローワールド¥n"); (where ハローワールド is Hello, world and ¥n is an escape sequence), assuming the I/O system supports Shift JIS output. Secondly, the 0x5C byte will cause problems when it appears as second byte of a two-byte character, because it will be interpreted as an escape sequence, which will mess up the interpretation, unless followed by another 0x5C.

Shift JIS requires an 8-bit clean medium for transmission. It is fully backwards compatible with the legacy JIS X 0201 single-byte encoding, meaning it supports half-width katakana and that any valid JIS X 0201 string is also a valid Shift JIS string. For two-byte characters, however, Shift JIS only guarantees that the first byte will be high bit set (0x80–0xFF); the value of the second byte can be either high or low. Appearance of byte values 0x40–0x7E as second bytes of code words makes reliable Shift JIS detection difficult, because same codes are used for ASCII characters. Since the same byte value can be either first or second byte, string searches are difficult, since simple searches can match the second byte of a character and the first byte of the next, which is not a real character. String search algorithms must be tailor-made for Shift JIS.

On the other hand, the competing 8-bit format EUC-JP, which does not support single-byte halfwidth katakana, allows for a much cleaner and direct conversion to and from JIS X 0208 code points, as all high bit set bytes are parts of a double-byte character and all codes from ASCII range represent single-byte characters.

Unicode also does not have some of the disadvantages of Shift JIS. Unicode does not have ambiguous versions: new characters are assigned to unused places by a single organization while private use areas are clearly designated, will never be used for standard characters, and are rarely needed due to the comprehensive nature of Unicode. For Shift JIS, companies work in parallel. UTF-8-encoded Unicode is backwards compatible with ASCII also for 0x5C, and does not have the string search problem.

For a double-byte JIS sequence  ,[7] the transformation to the corresponding Shift JIS bytes   is:

 
 

Multiple versions

 
Euler diagram comparing repertoires of JIS X 0208, JIS X 0212, JIS X 0213, Windows-31J, the Microsoft standard repertoire and Unicode
 
Relationship between Shift_JIS variants on the PC and related encodings, including intersections and other subsets. Names given are descriptive.

Many different versions of Shift JIS exist. There are two areas for expansion:

Firstly, JIS X 0208 does not fill the whole 94×94 space encoded for it in Shift JIS, therefore there is room for more characters here – these are really extensions to JIS X 0208 rather than to Shift JIS itself.

Secondly, Shift JIS has more encoding space than is needed for JIS X 0201 and JIS X 0208 (see § Shift JIS byte map below), and this space can and is used for yet more characters.

Windows-932 / Windows-31J

The most popular extension is Windows code page 932 (a CCSID also used for IBM's extension to Shift JIS), which is registered with the IANA as "Windows-31J",[1] separately from Shift JIS. This was popularized by Microsoft, although Microsoft itself does not recognize the Windows-31J name and instead calls that variation "shift_jis".[8][9] IBM's code page 943 includes the same double-byte codes as Microsoft's code page 932, while IBM's code page 932 includes fewer extensions (excluding those which Microsoft incorporates from NEC), and retains the character order from the 1978 edition of JIS X 0208, rather than implementing the character variant swaps from the 1983 standard.[10]

Windows-31J assigns 0x5C to U+005C REVERSE SOLIDUS (the backslash), and 0x7E to U+007E TILDE, following US-ASCII.[11] However, most localised fonts on Windows display U+005C as a Yen sign for JIS X 0201 compatibility.[12][13] It includes several extensions, namely "NEC special characters (Row 13), NEC selection of IBM extensions (Rows 89 to 92), and IBM extensions (Rows 115 to 119)",[1] in addition to setting some encoding space aside for end user definition.[14]

Windows codepage 932 is the version used in the W3C/WHATWG encoding standard used by HTML5, which includes the "formerly proprietary extensions from IBM and NEC" from Windows-31J in its table for JIS X 0208,[15] and also treats the label "shift_jis" interchangeably with "windows-31j" with the intent of being "compatible with deployed content".[16]

MacJapanese

The version of Shift-JIS originating from the classic Mac OS (known as x-mac-japanese, Code page 10001[8] or MacJapanese) assigned the tilde to 0x7E (following US-ASCII, not JIS X 0201 which assigns the overline here), but the Yen sign to 0x5C (as in JIS X 0201 and standard Shift JIS). It also extended JIS X 0201 by assigning the backslash to 0x80 (corresponding to 0x5C in US-ASCII), the non-breaking space to 0xA0, the copyright sign to 0xFD, the trademark symbol to 0xFE and the half-width horizontal ellipsis to 0xFF. It also added extended double byte characters; including 53 vertical presentation forms in the Shift_JIS range 0xEB41–0xED96, at 84 JIS rows down from their canonical forms, and 260 special characters in the Shift_JIS range 0x8540–0x886D.[17] This variant was introduced in KanjiTalk version 7.[18]

However, certain Mac OS typefaces used other variants. Sai Mincho and Chu Gothic use a "PostScript" variant of MacJapanese, which included additional vertical presentation forms and a different set of extended special characters, based on the NEC special characters, some of which were only available in the printer versions of the fonts.[17] Older versions of Maru Gothic and Hon Mincho from System 7.1 encoded vertical presentation forms at 10 (not 84) JIS rows down from their canonical forms, and did not include the special character extensions, this was subsequently changed.[17][19] The typical variant used with KanjiTalk version 6 placed the vertical presentation forms 10 rows down, and also used the NEC extension layout for row 13.[20]

Shift_JISx0213 and Shift_JIS-2004

Shift_JIS-2004
Alias(es)Shift_JISx0213
Language(s)Japanese, Ainu, English, Russian
StandardJIS X 0213
ExtendsShift_JIS (1997),
JIS X 0201 (8-bit)
Transforms / EncodesJIS X 0213
Preceded byShift_JIS (1997)

The newer JIS X 0213 standard defines an extended variant of Shift_JIS referred to as Shift_JISx0213 (in a previous version of the standard) or Shift_JIS-2004. It is a superset of standard Shift JIS.[21]

In order to represent the allocated rows on both planes of JIS X 0213, Shift_JIS-2004 uses the following method of mapping codepoints.[22]

 
 

In the above,   is a two-byte Shift_JIS-2004 sequence,   is the plane (, men, surface) number (1 or 2),   is the row (, ku, ward) number (1-94) and   is the cell (, ten, point) number (1-94). The ku and ten numbers are equivalent to   and   respectively, where   is a two-byte JIS sequence referencing a given plane.

The same set of characters can represented by EUC-JIS-2004, the EUC-JP based counterpart.

Some of the additions collide with popular Shift JIS extensions, including Windows codepage 932 which is used in web standards (see above). For example, compare plane 1 row 89 in JIS X 0213 (beginning 硃, 硎, 硏…)[23] to row 89 in the JIS X 0208 variant defined in web standards (beginning 纊, 褜, 鍈…).[24] In addition, some of the characters map to Unicode characters beyond the BMP.

Other variants

The space with lead bytes 0xF5 to 0xF9 (beyond the region used for JIS X 0208) is used by Japanese mobile phone operators for pictographs for use in E-mail.[25] KDDI goes further and defines hundreds more in the space with lead bytes 0xF3 and 0xF4.[26]

Beyond even this, there have been numerous minor variations made on Shift JIS, with individual characters here and there altered. Most of these extensions and variants have no IANA registration, so there is much scope for confusion, if the extensions are used.

A variant is the one that must be used if wanting to encode Shift JIS in source code strings of C and similar programming languages. This variant doubles the byte 0x5C if it appears as second byte of a two-byte character, but not if it appears as a single "¥" (ASCII: "\") character, because 0x5C is the beginning of an escape sequence. The best way of handling this is a special editor which encodes Shift JIS this way.

Shift JIS byte map

As defined in JIS X 0208:1997

The chart below gives the detailed meaning of each byte in a stream encoded in standard Shift JIS (conforming to JIS X 0208:1997).

First byte
0 1 2 3 4 5 6 7 8 9 A B C D E F
0
1
2 ! " # $ % & ' ( ) * + , - . /
3 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
4 @ A B C D E F G H I J K L M N O
5 P Q R S T U V W X Y Z [ ¥ ] ^ _
6 ` a b c d e f g h i j k l m n o
7 p q r s t u v w x y z { | }
8
9
A
B ソ
C
D
E
F
Second byte
0 1 2 3 4 5 6 7 8 9 A B C D E F
0
1
2
3
4
5
6
7
8
9
A
B
C
D
E
F
 
Non printable ASCII character
Unaltered ASCII character
Modified ASCII character
Single-byte half-width katakana
First byte of a double-byte JIS X 0208 character
Unused as first byte of a JIS X 0208 character
Second byte of a double-byte JIS X 0208 character whose first half of the JIS sequence was odd
Second byte of a double-byte JIS X 0208 character whose first half of the JIS sequence was even
Unused as second byte of a JIS X 0208 character

With vendor or JIS X 0213 extensions

Some of the bytes which are not used for single-byte codes or initial bytes in JIS X 0208:1997 are used by certain extensions, resulting in the layout detailed in the chart below.

First byte
0 1 2 3 4 5 6 7 8 9 A B C D E F
0
1
2 ! " # $ % & ' ( ) * + , - . /
3 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
4 @ A B C D E F G H I J K L M N O
5 P Q R S T U V W X Y Z [ ¥ ] ^ _
6 ` a b c d e f g h i j k l m n o
7 p q r s t u v w x y z { | }
8
9
A
B ソ
C
D
E
F
Second byte
0 1 2 3 4 5 6 7 8 9 A B C D E F
0
1
2
3
4
5
6
7
8
9
A
B
C
D
E
F
 
Non printable ASCII character
Unaltered ASCII character
Modified ASCII character
Single-byte half-width katakana
First byte of a double-byte character, used by JIS X 0208 (and by extensions such as JIS X 0213 plane 1)
First byte of a double-byte character, unallocated in JIS X 0208 but used by JIS X 0213 plane 1 or by vendor extensions
First byte of a double-byte character beyond JIS X 0208, used for JIS X 0213 plane 2 or for unrelated extensions
Not used as first byte, used by some single byte extensions
Second byte of a double-byte character whose first half of the JIS sequence was odd
Second byte of a double-byte character whose first half of the JIS sequence was even
Unused as second byte of a double-byte character


See also

References

  1. ^ a b c "Character Sets". IANA.
  2. ^ a b "convutf8.c". OpenSolaris. Line 305. 2008-11-12.
  3. ^ a b "Additional Japanese iconv Modules". What's New in the Solaris 9 9/04 Operating Environment. Oracle Corporation.
  4. ^ "Historical trends in the usage of character encodings for websites, February 2021". w3techs.com. Retrieved 2021-02-11.
  5. ^ "Distribution of Character Encodings among websites that use .jp". w3techs.com. Retrieved 2022-10-25.
  6. ^ "Distribution of Character Encodings among websites that use Japanese". w3techs.com. Retrieved 2022-07-17.
  7. ^ j1 and j2 are each in the range 33 (0x21) to 126 (0x7e) inclusive (i.e., 7-bit character values excluding control characters (0–31 (0x1f) and 127 (0x7f)) and space)
  8. ^ a b "Encoding.WindowsCodePage Property – .NET Framework (current version)". MSDN. Microsoft.
  9. ^ "Code Page Identifiers". Windows Dev Center. Microsoft.
  10. ^ "IBM-943 and IBM-932". IBM Knowledge Center. IBM.
  11. ^ "CP932.TXT". Unicode Consortium.
  12. ^ . Problems and Solutions for Unicode and User/Vendor Defined Characters. The Open Group Japan. Archived from the original on 1999-02-03.
  13. ^ Kaplan, Michael S. (2005-09-17). "When is a backslash not a backslash?".
  14. ^ Kaplan, Michael S (2007-05-26). "The PUA outside of Unicode". Sorting it all out.
  15. ^ "5. Indexes (§ Index jis0208)". Encoding Standard. WHATWG.
  16. ^ "4.2. Names and labels". Encoding Standard. WHATWG.
  17. ^ a b c "JAPANESE.TXT: Map (external version) from Mac OS Japanese encoding to Unicode 2.1 and later". Apple Computer, Inc.; Unicode Consortium.
  18. ^ Lunde, Ken (2019-03-21). "A Brief History of Japan's Era Name Ligatures". CJK Type Blog. Adobe Inc.
  19. ^ "Encoding Variants for MacJapanese". Apple Developer Documentation. Apple.
  20. ^ Lunde, Ken (2008). "Appendix E: Vendor Character Set Standards" (PDF). CJKV Information Processing. O'Reilly Media. ISBN 9780596514471.
  21. ^ "JIS X 0213 Code Mapping Tables". x0213.org.
  22. ^ "JIS X 0213の代表的な符号化方式 § Shift_JIS-2004" (in Japanese). Hexadecimal numbers in the source have been converted to decimal for display.
  23. ^ Japanese Industrial Standards Committee (2004-04-13). (PDF). ITSCJ/IPSJ. ISO-IR-233. Archived from the original (PDF) on 2022-03-10.
  24. ^ "Index jis0208 visualization". Encoding Standard. WHATWG.
  25. ^ "Original Emoji from DoCoMo". FileFormat.info.
  26. ^ "Original Emoji from KDDI". FileFormat.info.

External links

  • Shift-JIS Kanji Table – a table of the non-ASCII part of the codeset
  • . Microsoft. May 1, 2005. Archived from the original on 2008-03-07. – Microsoft's definition
  • Forms of Shift-JIS in ICU (International Components for Unicode)
    • ibm-942 (sjis78)
    • ibm-943 (contains the \u00A5 ↔ \x5C mapping)
    • Shift JIS (contains the \u005C ↔ \x5C mapping)

shift, shift, japanese, industrial, standards, also, sjis, mime, name, shift, known, solaris, contexts, character, encoding, japanese, language, originally, developed, japanese, company, called, ascii, corporation, conjunction, with, microsoft, standardized, 0. Shift JIS Shift Japanese Industrial Standards also SJIS MIME name Shift JIS known as PCK in Solaris contexts 2 3 is a character encoding for the Japanese language originally developed by a Japanese company called ASCII Corporation in conjunction with Microsoft and standardized as JIS X 0208 Appendix 1 As of October 2022 update 0 2 of all web pages used Shift JIS a decline from 1 3 in July 2014 4 Shift JISMIME IANAShift JISAlias es MS Kanji 1 PCK 2 3 Language s Primarily Japanese but also supporting English Russian Bulgarian GreekStandardJIS X 0208 1997 Appendix 1ClassificationExtended ISO 646 a variable width encoding CJK encodingExtendsJIS X 0201 8 bit formatTransforms EncodesJIS X 0208Succeeded byShift JIS 2004 JIS Windows 31J web Not in the strictest sense of the term as ASCII bytes can appear as trail bytes vteShift JIS is the second most popular character encoding for Japanese websites used by 5 6 of sites in the jp domain UTF 8 is used by 94 4 of Japanese websites 5 6 Contents 1 Description 2 Multiple versions 2 1 Windows 932 Windows 31J 2 2 MacJapanese 2 3 Shift JISx0213 and Shift JIS 2004 2 4 Other variants 3 Shift JIS byte map 3 1 As defined in JIS X 0208 1997 3 2 With vendor or JIS X 0213 extensions 4 See also 5 References 6 External linksDescription EditShift JIS is based on character sets defined within JIS standards JIS X 0201 1997 for the single byte characters and JIS X 0208 1997 for the double byte characters The lead bytes for the double byte characters are shifted around the 64 halfwidth katakana characters in the single byte range 0xA1 to 0xDF The single byte characters 0x00 to 0x7F match the ASCII encoding except for a yen sign U 00A5 at 0x5C and an overline U 203E at 0x7E in place of the ASCII character set s backslash and tilde respectively The single byte characters from 0xA1 to 0xDF map to the half width katakana characters found in JIS X 0201 HTML written in Shift JIS can still be interpreted to some extent when incorrectly tagged as ASCII and when the charset tag is in the top of the document itself since the important start and end of HTML tags and fields lt gt amp are coded by the same single bytes as in ASCII and those bytes won t appear in two byte sequences Shift JIS is possible to use in string literals in programming languages such as C but a few things must be taken into consideration Firstly that the escape character 0x5C normally backslash is the half width yen sign in Shift JIS If the programmer is aware of this it would be possible to use printf ハローワールド n where ハローワールド is Hello world and n is an escape sequence assuming the I O system supports Shift JIS output Secondly the 0x5C byte will cause problems when it appears as second byte of a two byte character because it will be interpreted as an escape sequence which will mess up the interpretation unless followed by another 0x5C Shift JIS requires an 8 bit clean medium for transmission It is fully backwards compatible with the legacy JIS X 0201 single byte encoding meaning it supports half width katakana and that any valid JIS X 0201 string is also a valid Shift JIS string For two byte characters however Shift JIS only guarantees that the first byte will be high bit set 0x80 0xFF the value of the second byte can be either high or low Appearance of byte values 0x40 0x7E as second bytes of code words makes reliable Shift JIS detection difficult because same codes are used for ASCII characters Since the same byte value can be either first or second byte string searches are difficult since simple searches can match the second byte of a character and the first byte of the next which is not a real character String search algorithms must be tailor made for Shift JIS On the other hand the competing 8 bit format EUC JP which does not support single byte halfwidth katakana allows for a much cleaner and direct conversion to and from JIS X 0208 code points as all high bit set bytes are parts of a double byte character and all codes from ASCII range represent single byte characters Unicode also does not have some of the disadvantages of Shift JIS Unicode does not have ambiguous versions new characters are assigned to unused places by a single organization while private use areas are clearly designated will never be used for standard characters and are rarely needed due to the comprehensive nature of Unicode For Shift JIS companies work in parallel UTF 8 encoded Unicode is backwards compatible with ASCII also for 0x5C and does not have the string search problem For a double byte JIS sequence j 1 j 2 displaystyle j 1 j 2 7 the transformation to the corresponding Shift JIS bytes s 1 s 2 displaystyle s 1 s 2 is s 1 j 1 1 2 112 if 33 j 1 94 j 1 1 2 176 if 95 j 1 126 displaystyle s 1 begin cases left lfloor frac j 1 1 2 right rfloor 112 amp mbox if 33 leq j 1 leq 94 left lfloor frac j 1 1 2 right rfloor 176 amp mbox if 95 leq j 1 leq 126 end cases s 2 j 2 31 j 2 96 if j 1 is odd j 2 126 if j 1 is even displaystyle s 2 begin cases j 2 31 left lfloor frac j 2 96 right rfloor amp mbox if j 1 mbox is odd j 2 126 amp mbox if j 1 mbox is even end cases Multiple versions Edit Euler diagram comparing repertoires of JIS X 0208 JIS X 0212 JIS X 0213 Windows 31J the Microsoft standard repertoire and Unicode Relationship between Shift JIS variants on the PC and related encodings including intersections and other subsets Names given are descriptive Many different versions of Shift JIS exist There are two areas for expansion Firstly JIS X 0208 does not fill the whole 94 94 space encoded for it in Shift JIS therefore there is room for more characters here these are really extensions to JIS X 0208 rather than to Shift JIS itself Secondly Shift JIS has more encoding space than is needed for JIS X 0201 and JIS X 0208 see Shift JIS byte map below and this space can and is used for yet more characters Windows 932 Windows 31J Edit Main article Code page 932 Microsoft Windows The most popular extension is Windows code page 932 a CCSID also used for IBM s extension to Shift JIS which is registered with the IANA as Windows 31J 1 separately from Shift JIS This was popularized by Microsoft although Microsoft itself does not recognize the Windows 31J name and instead calls that variation shift jis 8 9 IBM s code page 943 includes the same double byte codes as Microsoft s code page 932 while IBM s code page 932 includes fewer extensions excluding those which Microsoft incorporates from NEC and retains the character order from the 1978 edition of JIS X 0208 rather than implementing the character variant swaps from the 1983 standard 10 Windows 31J assigns 0x5C to U 005C REVERSE SOLIDUS the backslash and 0x7E to U 007E TILDE following US ASCII 11 However most localised fonts on Windows display U 005C as a Yen sign for JIS X 0201 compatibility 12 13 It includes several extensions namely NEC special characters Row 13 NEC selection of IBM extensions Rows 89 to 92 and IBM extensions Rows 115 to 119 1 in addition to setting some encoding space aside for end user definition 14 Windows codepage 932 is the version used in the W3C WHATWG encoding standard used by HTML5 which includes the formerly proprietary extensions from IBM and NEC from Windows 31J in its table for JIS X 0208 15 and also treats the label shift jis interchangeably with windows 31j with the intent of being compatible with deployed content 16 MacJapanese Edit The version of Shift JIS originating from the classic Mac OS known as x mac japanese Code page 10001 8 or MacJapanese assigned the tilde to 0x7E following US ASCII not JIS X 0201 which assigns the overline here but the Yen sign to 0x5C as in JIS X 0201 and standard Shift JIS It also extended JIS X 0201 by assigning the backslash to 0x80 corresponding to 0x5C in US ASCII the non breaking space to 0xA0 the copyright sign to 0xFD the trademark symbol to 0xFE and the half width horizontal ellipsis to 0xFF It also added extended double byte characters including 53 vertical presentation forms in the Shift JIS range 0xEB41 0xED96 at 84 JIS rows down from their canonical forms and 260 special characters in the Shift JIS range 0x8540 0x886D 17 This variant was introduced in KanjiTalk version 7 18 However certain Mac OS typefaces used other variants Sai Mincho and Chu Gothic use a PostScript variant of MacJapanese which included additional vertical presentation forms and a different set of extended special characters based on the NEC special characters some of which were only available in the printer versions of the fonts 17 Older versions of Maru Gothic and Hon Mincho from System 7 1 encoded vertical presentation forms at 10 not 84 JIS rows down from their canonical forms and did not include the special character extensions this was subsequently changed 17 19 The typical variant used with KanjiTalk version 6 placed the vertical presentation forms 10 rows down and also used the NEC extension layout for row 13 20 Shift JISx0213 and Shift JIS 2004 Edit Shift JIS 2004Alias es Shift JISx0213Language s Japanese Ainu English RussianStandardJIS X 0213ExtendsShift JIS 1997 JIS X 0201 8 bit Transforms EncodesJIS X 0213Preceded byShift JIS 1997 vteThe newer JIS X 0213 standard defines an extended variant of Shift JIS referred to as Shift JISx0213 in a previous version of the standard or Shift JIS 2004 It is a superset of standard Shift JIS 21 In order to represent the allocated rows on both planes of JIS X 0213 Shift JIS 2004 uses the following method of mapping codepoints 22 s 1 k 257 2 if m 1 and 1 k 62 k 385 2 if m 1 and 63 k 94 k 479 2 k 8 3 if m 2 and k 1 3 4 5 8 12 13 14 15 k 411 2 if m 2 and 78 k 94 displaystyle s 1 begin cases left lfloor frac k 257 2 right rfloor amp mbox if m 1 mbox and 1 leq k leq 62 left lfloor frac k 385 2 right rfloor amp mbox if m 1 mbox and 63 leq k leq 94 left lfloor frac k 479 2 right rfloor left lfloor frac k 8 right rfloor times 3 amp mbox if m 2 mbox and k 1 3 4 5 8 12 13 14 15 left lfloor frac k 411 2 right rfloor amp mbox if m 2 mbox and 78 leq k leq 94 end cases s 2 t 63 if k is odd and 1 t 63 t 64 if k is odd and 64 t 94 t 158 if k is even displaystyle s 2 begin cases t 63 amp mbox if k mbox is odd and 1 leq t leq 63 t 64 amp mbox if k mbox is odd and 64 leq t leq 94 t 158 amp mbox if k mbox is even end cases In the above s 1 s 2 displaystyle s 1 s 2 is a two byte Shift JIS 2004 sequence m displaystyle m is the plane 面 men surface number 1 or 2 k displaystyle k is the row 区 ku ward number 1 94 and t displaystyle t is the cell 点 ten point number 1 94 The ku and ten numbers are equivalent to j 1 32 displaystyle j 1 32 and j 2 32 displaystyle j 2 32 respectively where j 1 j 2 displaystyle j 1 j 2 is a two byte JIS sequence referencing a given plane The same set of characters can represented by EUC JIS 2004 the EUC JP based counterpart Some of the additions collide with popular Shift JIS extensions including Windows codepage 932 which is used in web standards see above For example compare plane 1 row 89 in JIS X 0213 beginning 硃 硎 硏 23 to row 89 in the JIS X 0208 variant defined in web standards beginning 纊 褜 鍈 24 In addition some of the characters map to Unicode characters beyond the BMP Other variants Edit Further information Implementation of emojis JIS Shift JIS and Private Use Area encodings The space with lead bytes 0xF5 to 0xF9 beyond the region used for JIS X 0208 is used by Japanese mobile phone operators for pictographs for use in E mail 25 KDDI goes further and defines hundreds more in the space with lead bytes 0xF3 and 0xF4 26 Beyond even this there have been numerous minor variations made on Shift JIS with individual characters here and there altered Most of these extensions and variants have no IANA registration so there is much scope for confusion if the extensions are used A variant is the one that must be used if wanting to encode Shift JIS in source code strings of C and similar programming languages This variant doubles the byte 0x5C if it appears as second byte of a two byte character but not if it appears as a single ASCII character because 0x5C is the beginning of an escape sequence The best way of handling this is a special editor which encodes Shift JIS this way Shift JIS byte map EditAs defined in JIS X 0208 1997 Edit The chart below gives the detailed meaning of each byte in a stream encoded in standard Shift JIS conforming to JIS X 0208 1997 First byte 0 1 2 3 4 5 6 7 8 9 A B C D E F0 1 2 amp 3 0 1 2 3 4 5 6 7 8 9 lt gt 4 A B C D E F G H I J K L M N O5 P Q R S T U V W X Y Z 6 a b c d e f g h i j k l m n o7 p q r s t u v w x y z 89A ヲ ァ ィ ゥ ェ ォ ャ ュ ョ ッB ー ア イ ウ エ オ カ キ ク ケ コ サ シ ス セ ソC タ チ ツ テ ト ナ ニ ヌ ネ ノ ハ ヒ フ ヘ ホ マD ミ ム メ モ ヤ ユ ヨ ラ リ ル レ ロ ワ ン ゙ ゚EF Second byte 0 1 2 3 4 5 6 7 8 9 A B C D E F0123456789ABCDEF Non printable ASCII characterUnaltered ASCII characterModified ASCII characterSingle byte half width katakanaFirst byte of a double byte JIS X 0208 characterUnused as first byte of a JIS X 0208 characterSecond byte of a double byte JIS X 0208 character whose first half of the JIS sequence was oddSecond byte of a double byte JIS X 0208 character whose first half of the JIS sequence was evenUnused as second byte of a JIS X 0208 characterWith vendor or JIS X 0213 extensions Edit Some of the bytes which are not used for single byte codes or initial bytes in JIS X 0208 1997 are used by certain extensions resulting in the layout detailed in the chart below First byte 0 1 2 3 4 5 6 7 8 9 A B C D E F0 1 2 amp 3 0 1 2 3 4 5 6 7 8 9 lt gt 4 A B C D E F G H I J K L M N O5 P Q R S T U V W X Y Z 6 a b c d e f g h i j k l m n o7 p q r s t u v w x y z 89A ヲ ァ ィ ゥ ェ ォ ャ ュ ョ ッB ー ア イ ウ エ オ カ キ ク ケ コ サ シ ス セ ソC タ チ ツ テ ト ナ ニ ヌ ネ ノ ハ ヒ フ ヘ ホ マD ミ ム メ モ ヤ ユ ヨ ラ リ ル レ ロ ワ ン ゙ ゚EF Second byte 0 1 2 3 4 5 6 7 8 9 A B C D E F0123456789ABCDEF Non printable ASCII characterUnaltered ASCII characterModified ASCII characterSingle byte half width katakanaFirst byte of a double byte character used by JIS X 0208 and by extensions such as JIS X 0213 plane 1 First byte of a double byte character unallocated in JIS X 0208 but used by JIS X 0213 plane 1 or by vendor extensionsFirst byte of a double byte character beyond JIS X 0208 used for JIS X 0213 plane 2 or for unrelated extensionsNot used as first byte used by some single byte extensionsSecond byte of a double byte character whose first half of the JIS sequence was oddSecond byte of a double byte character whose first half of the JIS sequence was evenUnused as second byte of a double byte characterSee also EditJapanese language and computers Code page 932 Microsoft Windows Mojibake Shift JIS artReferences Edit a b c Character Sets IANA a b convutf8 c OpenSolaris Line 305 2008 11 12 a b Additional Japanese iconv Modules What s New in the Solaris 9 9 04 Operating Environment Oracle Corporation Historical trends in the usage of character encodings for websites February 2021 w3techs com Retrieved 2021 02 11 Distribution of Character Encodings among websites that use jp w3techs com Retrieved 2022 10 25 Distribution of Character Encodings among websites that use Japanese w3techs com Retrieved 2022 07 17 j1 and j2 are each in the range 33 0x21 to 126 0x7e inclusive i e 7 bit character values excluding control characters 0 31 0x1f and 127 0x7f and space a b Encoding WindowsCodePage Property NET Framework current version MSDN Microsoft Code Page Identifiers Windows Dev Center Microsoft IBM 943 and IBM 932 IBM Knowledge Center IBM CP932 TXT Unicode Consortium 3 1 1 Details of Problems Problems and Solutions for Unicode and User Vendor Defined Characters The Open Group Japan Archived from the original on 1999 02 03 Kaplan Michael S 2005 09 17 When is a backslash not a backslash Kaplan Michael S 2007 05 26 The PUA outside of Unicode Sorting it all out 5 Indexes Index jis0208 Encoding Standard WHATWG 4 2 Names and labels Encoding Standard WHATWG a b c JAPANESE TXT Map external version from Mac OS Japanese encoding to Unicode 2 1 and later Apple Computer Inc Unicode Consortium Lunde Ken 2019 03 21 A Brief History of Japan s Era Name Ligatures CJK Type Blog Adobe Inc Encoding Variants for MacJapanese Apple Developer Documentation Apple Lunde Ken 2008 Appendix E Vendor Character Set Standards PDF CJKV Information Processing O Reilly Media ISBN 9780596514471 JIS X 0213 Code Mapping Tables x0213 org JIS X 0213の代表的な符号化方式 Shift JIS 2004 in Japanese Hexadecimal numbers in the source have been converted to decimal for display Japanese Industrial Standards Committee 2004 04 13 Japanese Graphic Character Set for Information Interchange Plane 1 PDF ITSCJ IPSJ ISO IR 233 Archived from the original PDF on 2022 03 10 Index jis0208 visualization Encoding Standard WHATWG Original Emoji from DoCoMo FileFormat info Original Emoji from KDDI FileFormat info External links EditShift JIS Kanji Table a table of the non ASCII part of the codeset Windows Codepage 932 Microsoft May 1 2005 Archived from the original on 2008 03 07 Microsoft s definition Forms of Shift JIS in ICU International Components for Unicode ibm 942 sjis78 ibm 943 contains the u00A5 x5C mapping Shift JIS contains the u005C x5C mapping Retrieved from https en wikipedia org w index php title Shift JIS amp oldid 1118113512 Shift JISx0213 and Shift JIS 2004, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.