fbpx
Wikipedia

JIS X 0208

JIS X 0208 is a 2-byte character set specified as a Japanese Industrial Standard, containing 6879 graphic characters suitable for writing text, place names, personal names, and so forth in the Japanese language. The official title of the current standard is 7-bit and 8-bit double byte coded KANJI sets for information interchange (7ビット及び8ビットの2バイト情報交換用符号化漢字集合, Nana-Bitto Oyobi Hachi-Bitto no Ni-Baito Jōhō Kōkan'yō Fugōka Kanji Shūgō). It was originally established as JIS C 6226 in 1978, and has been revised in 1983, 1990, and 1997. It is also called Code page 952 by IBM. The 1978 version is also called Code page 955 by IBM.

JIS X 0208
Alias(es)JIS C 6226
Language(s)Japanese, English, Russian, Bulgarian
Partial support: Greek, Chinese
StandardJIS X 0208:1978 through 1997
ClassificationISO 2022, DBCS, CJK encoding
ExtensionsARIB STD B24 Kanji, NEC PC98 DBCS
Encoding formats
Preceded byJIS X 0201
Succeeded byJIS X 0213
Other related encoding(s)KS X 1001, GB 2312, JIS X 0212

Scope of use and compatibility Edit

The character set JIS X 0208 establishes is primarily for the purpose of information interchange (情報交換, jōhō kōkan) between data processing systems and the devices connected to them, or mutually between data communication systems. This character set can be used for data processing and text processing.

Partial implementations of the character set are not considered compatible. Because there are places where such things have happened as the original drafting committee of the first standard taking care to separate characters between level 1 and level 2 and the second standard then shuffling some variant characters (異体字, itaiji) between the levels, at least in the first and second standards, it is conjectured that non-kanji and level 1-only implementation Japanese computer systems were at one time considered for development. However, such implementations have never been specified as compatible, though examples such as the early NEC PC-9801 did exist.[1]

Even though there are provisions in the JIS X 0208:1997 standard concerning compatibility, at the present time, it is generally considered that this standard neither certifies compatibility nor is it an official manufacturing standard that amounts to a declaration of self-compatibility.[2] Consequently, de facto, JIS X 0208-"compatible" products are not considered to exist. Terminology such as "conformant" (準拠, junkyo) and "support" (対応, taiō) is included in JIS X 0208, but the semantics of these terms vary from person to person.

Code charts Edit

Lead byte Edit

The first encoding byte corresponds to the row or cell number plus 0x20, or 32 in decimal (see below). Hence, the code set starting with 0x21 has a row number of 1, and its cell 1 has a continuation byte of 0x21 (or 33), and so forth.

For lead bytes used for characters other than kanji, links are provided to charts on this page listing the characters encoded under that lead byte. For lead bytes used for kanji, links are provided to the appropriate section of Wiktionary's kanji index.

JIS X 0208 (lead bytes)
0 1 2 3 4 5 6 7 8 9 A B C D E F
2x  SP  1-_ 2-_ 3-_ 4-_ 5-_ 6-_ 7-_ 8-_ 9-_ 10-_ 11-_ 12-_ 13-_ 14-_ 15-_
3x 16-_ 17-_ 18-_ 19-_ 20-_ 21-_ 22-_ 23-_ 24-_ 25-_ 26-_ 27-_ 28-_ 29-_ 30-_ 31-_
4x 32-_ 33-_ 34-_ 35-_ 36-_ 37-_ 38-_ 39-_ 40-_ 41-_ 42-_ 43-_ 44-_ 45-_ 46-_ 47-_
5x 48-_ 49-_ 50-_ 51-_ 52-_ 53-_ 54-_ 55-_ 56-_ 57-_ 58-_ 59-_ 60-_ 61-_ 62-_ 63-_
6x 64-_ 65-_ 66-_ 67-_ 68-_ 69-_ 70-_ 71-_ 72-_ 73-_ 74-_ 75-_ 76-_ 77-_ 78-_ 79-_
7x 80-_ 81-_ 82-_ 83-_ 84-_ 85-_ 86-_ 87-_ 88-_ 89-_ 90-_ 91-_ 92-_ 93-_ 94-_ DEL

Non-Kanji rows Edit

Character set 0x21 (row number 1, special characters) Edit

Some vendors use slightly different Unicode mapping for this set than the one below. For example, Microsoft maps kuten 1-29 (JIS 0x213D) to U+2015 (Horizontal Bar),[3] whereas Apple maps it to U+2014 (Em Dash).[4] Similarly, Microsoft maps kuten 1-61 (JIS 0x215D) to U+FF0D[3] (the fullwidth form of U+002D Hyphen-Minus), and Apple maps it to U+2212 (Minus Sign).[4] Unicode mapping of the wave dash also differs between vendors. See the cells with footnotes below.

ASCII and JISCII punctuation (shown here with a yellow background) may use alternative mappings to the Halfwidth and Fullwidth Forms block if used in an encoding which combines JIS X 0208 with ASCII or with JIS X 0201, such as Shift JIS, EUC-JP or ISO 2022-JP.

JIS X 0208 (prefixed with 0x21)
0 1 2 3 4 5 6 7 8 9 A B C D E F
2x IDSP , . : ; ? ! ´ ` ¨
3x ^ _ [b] /
4x \ [c] [d] | ( ) [ ]
5x { } + [e] ± ×
6x ÷ = < > ° ¥
7x $ ¢ £ % # & * @ §

Character set 0x22 (row number 2, special characters) Edit

Most of the characters in this set were added in 1983, except for characters 0x2221–0x222E (kuten 2-1 through 2-14, or the first line of the chart below), which were included in the original 1978 version of the standard.

JIS X 0208 (prefixed with 0x22)
0 1 2 3 4 5 6 7 8 9 A B C D E F
2x
3x
4x ¬
5x
6x
7x

Character set 0x23 (row number 3, digits and Roman) Edit

This set includes a subset of the ISO 646 invariant set (and therefore also a subset of both ASCII and the JIS X 0201 Roman set), minus punctuation and symbols, comprising western Arabic numerals and both cases of the Basic Latin alphabet. Characters in this set may use alternative Unicode mappings to the Halfwidth and Fullwidth Forms block if used in an encoding which combines JIS X 0208 with ASCII or with JIS X 0201, such as EUC-JP, Shift JIS or ISO 2022-JP.

Compare row 3 of KPS 9566, which this row exactly matches. Compare and contrast row 3 of KS X 1001 and of GB 2312, which include their entire national variants of ISO 646 in this row, rather than only the alphanumeric subset.

JIS X 0208 (prefixed with 0x23)
0 1 2 3 4 5 6 7 8 9 A B C D E F
2x
3x 0 1 2 3 4 5 6 7 8 9
4x A B C D E F G H I J K L M N O
5x P Q R S T U V W X Y Z
6x a b c d e f g h i j k l m n o
7x p q r s t u v w x y z

Character set 0x24 (row number 4, Hiragana) Edit

This row contains Japanese Hiragana.

Compare row 4 of GB 2312, which matches this row. Compare and contrast row 10 of KPS 9566 and of KS X 1001, which use the same layout, but in a different row.

Character set 0x25 (row number 5, Katakana) Edit

This row contains Japanese Katakana.

Compare row 5 of GB 2312, which matches this row. Compare and contrast row 11 of KPS 9566 and of KS X 1001, which use the same layout, but in a different row. Contrast the considerably different Katakana layout used by JIS X 0201.

Character set 0x26 (row number 6, Greek) Edit

This row contains basic support for the modern Greek alphabet, without diacritics or the final sigma.

Compare row 6 of GB 2312 and GB 12345 and row 6 of KPS 9566, which include the same Greek letters in the same layout, although GB 12345 adds vertical presentation forms and KPS 9566 adds Roman numerals. Compare and contrast row 5 of KS X 1001, which offsets the Greek letters to include the Roman numerals first.

JIS X 0208 (prefixed with 0x26)
0 1 2 3 4 5 6 7 8 9 A B C D E F
2x Α Β Γ Δ Ε Ζ Η Θ Ι Κ Λ Μ Ν Ξ Ο
3x Π Ρ Σ Τ Υ Φ Χ Ψ Ω
4x α β γ δ ε ζ η θ ι κ λ μ ν ξ ο
5x π ρ σ τ υ φ χ ψ ω
6x
7x

Character set 0x27 (row number 7, Cyrillic) Edit

This row contains the modern Russian alphabet and is not necessarily sufficient for representing other forms of the Cyrillic script.

Compare row 7 of GB 2312, which matches this row. Compare and contrast row 12 of KS X 1001 and row 5 of KPS 9566, which use the same layout (but in a different row).

JIS X 0208 (prefixed with 0x27)
0 1 2 3 4 5 6 7 8 9 A B C D E F
2x А Б В Г Д Е Ё Ж З И Й К Л М Н
3x О П Р С Т У Ф Х Ц Ч Ш Щ Ъ Ы Ь Э
4x Ю Я
5x а б в г д е ё ж з и й к л м н
6x о п р с т у ф х ц ч ш щ ъ ы ь э
7x ю я

Character set 0x28 (row number 8, box drawing) Edit

All characters in this set were added in 1983, and were not present in the original 1978 revision of the standard.

JIS X 0208 (prefixed with 0x28)
0 1 2 3 4 5 6 7 8 9 A B C D E F
2x
3x
4x
5x
6x
7x

Extension character set 0x2D (row number 13, NEC special characters) Edit

Rows 9 through 15 of the JIS X 0208 standard are left empty.

However, the following layout for row 13, first introduced by NEC, is a common extension. It is used (with minor variations, noted in footnotes) by Windows-932[3] (which is matched by the WHATWG Encoding Standard used by HTML5), by the PostScript variant (but, since KanjiTalk version 7, not the regular variant)[5] of MacJapanese, and by JIS X 0213 (the successor to JIS X 0208).[5][6] Unlike the other extensions made by Windows-932/WHATWG and JIS X 0213, the two match rather than colliding, so decoding of most of this row is better supported than the other extensions made by JIS X 0213.

NEC Special Characters for JIS X 0208 (prefixed by 0x2D)
0 1 2 3 4 5 6 7 8 9 A B C D E F
2x
3x [f]
4x
5x [f] [g]
6x
7x [h] [h] [h] [h] [h] [h] [h] [h] [h] [f] [f]

Kanji rows Edit

Code structure Edit

In order to represent code points, column/line numbers are used for one-byte codes and kuten numbers are used for two-byte codes. For a way to identify a character without depending on a code, character names are used.

Single byte codes Edit

Almost all JIS X 0208 graphic character codes are represented with two bytes of at least seven bits each. However, every control character, as well as the plain space – although not the ideographic space – is represented with a one-byte code. In order to represent the bit combination (ビット組合せ, bitto kumiawase) of a one-byte code, two decimal numbers – a column number and a line number – are used. Three high-order bits out of seven or four high-order bits out of eight, counting from zero to seven or from zero to fifteen respectively, form the column number. Four low-order bits counting from zero to fifteen form the line number. Each decimal number corresponds to one hexadecimal digit. For example, the bit combination corresponding to the graphic character "space" is 010 0000 as a 7-bit number, and 0010 0000 as an 8-bit number. In column/line notation, this is represented as 2/0. Other representations of the same single-byte code include 0x20 as hexadecimal, or 32 as a single decimal number.

Code points and code numbers Edit

The double-byte codes are laid out in 94 numbered groups, each called a row (, ku, lit. "section"). Every row contains 94 numbered codes, each called a cell (, ten, lit. "point").[i] This makes a total of 8836 (94 × 94) possible code points (although not all are assigned, see below); these are laid out in the standard in a 94-line, 94-column code table.

A row number and a cell number (each numbered from 1 to 94, for a standard JIS X 0208 code) form a kuten (区点) point, which is used to represent double-byte code points. A code number or kuten number (区点番号, kuten bangō) is expressed in the form "row-cell", the row and cell numbers being separated by a hyphen. For example, the character "" has a code point at row 16, cell 1, so its code number is represented as "16-01".

In 7-bit JIS X 0208 (as might be switched to in JIS X 0202 / ISO-2022-JP), both bytes must be from the 94-byte range of 0x21 (used for row or cell number 1) through 0x7E (used for row or cell number 94) – exactly corresponding to the range used for 7-bit ASCII printing characters, not counting the space. Accordingly, the encoded bytes are obtained by adding 0x20 (32) to each number.[7] For instance, the above example of 16-01 ("亜") would be represented by the bytes 0x30 0x21. The 8-bit EUC-JP instead uses the range 0xA1 through 0xFE (setting the high bit to 1), whereas other encodings such as Shift JIS use more complicated transforms. Shift JIS includes more encoding space than is needed for JIS X 0208 itself; some Shift JIS specific extensions to JIS X 0208 make use of row numbers above 94.[8]

This structure is also used in the Mainland Chinese GB 2312, where it is natively known as 区位; qūwèi, and the South Korean KS C 5601 (currently KS X 1001), where the ku and ten are respectively known as hang[9] (; ; haeng) and yol[9] (; ; yeol). The later JIS X 0213 extends this structure by having more than one plane (, men, lit. "face") of rows, which is also the structure used by CNS 11643, and related to the structure used by CCCII.

Unassigned code points Edit

Among the 2-byte codes, rows 9 to 15 and 85 to 94 are unassigned code points (空き領域, aki ryōiki); that is, they are code points with no characters assigned to them. Also, some cells in other rows are also essentially unassigned code points.

These empty areas contain code points that should basically not be used. Except when there is prior agreement among the relevant parties, characters (gaiji) for information interchange should not be assigned to the unassigned code points.

Even when assigning characters to unassigned code points, graphic characters defined in the standard should not be assigned to them, and the same character should not be assigned to multiple unassigned code points; characters should not be duplicated in the set.

Furthermore, when assigning characters to unassigned code points, it is necessary to be cautious of unification in regards to kanji glyphs. For example, row 25 cell 66 corresponds to the kanji meaning "high" or "expensive"; both the form with a component resembling the "mouth" character () in the middle () and the less common form with a ladder-like construction in the same location () are subsumed into the same code point. Consequently, limiting point 25-66 to the "mouth" form and assigning the latter "ladder" form to an unassigned code point would technically be in violation of the standard.

In practice, however, several vendor-specific Shift JIS variants, including Windows-932 and MacJapanese, encode vendor extensions in unallocated rows of the encoding space for JIS X 0208. Also, most of the codes unassigned in JIS X 0208 are assigned by the newer JIS X 0213 standard.

Character names Edit

Each JIS X 0208 character is given a name. By using a character's name, it is possible to identify characters without relying on their codes. The names of characters are coordinated with other character set standards, notably the Universal Coded Character Set (UCS/Unicode), so this is one possible source of character mappings to character sets such as Unicode. For example, both the character at ISO/IEC 646 International Reference Version (US-ASCII) column 4 line 1 and the one at JIS X 0208 row 3 cell 33 have the name "LATIN CAPITAL LETTER A". Therefore, the character at 4/1 in ASCII and the character at 3-33 in JIS X 0208 can be regarded as the same character (although, in practice, alternative mapping is used for the JIS X 0208 character due to encodings providing ASCII separately). Conversely, ASCII characters 2/2 (quotation mark), 2/7 (apostrophe), 2/13 (hyphen-minus), and 7/14 (tilde) can be determined to be characters that do not exist in this standard.

Character names of non-kanji characters use uppercase Roman letters, spaces, and hyphens. Non-kanji characters are given a Japanese-language common name (日本語通用名称, Nihongo tsūyō meishō), but some provisions for these names do not exist.[j] The names of kanji, on the other hand, are mechanically set according to the corresponding hexadecimal representation of their code in UCS/Unicode. The name of a kanji can be arrived at by prepending the Unicode codepoint with "CJK UNIFIED IDEOGRAPH-". For example, row 16 cell 1 () corresponds to U+4E9C in UCS, so the name of it would be "CJK UNIFIED IDEOGRAPH-4E9C". Kanji are not given Japanese common names.

Kanji set Edit

Overview Edit

JIS X 0208 prescribes a set of 6879 graphical characters that correspond to two-byte codes with either seven or eight bits to the byte; in JIS X 0208, this is called the kanji set (漢字集合, kanji shūgō), which includes 6355 kanji as well as 524 non-kanji (非漢字, hikanji), including characters such as Latin letters, kana, and so forth.

Special characters
Occupies rows 1 and 2. There are 18 descriptor symbols (記述記号, kijutsu kigō) such as the "ideographic space" ( ), and the Japanese comma and period; eight diacritical marks such as dakuten and handakuten; 10 characters for things that follow kana or kanji (仮名又は漢字に準じるもの, kana mata wa kanji ni junjiru mono) such as the Iteration mark; 22 bracket symbols (括弧記号, kakko kigō); 45 mathematical symbols (学術記号, gakujutsu kigō); and 32 unit symbols, which includes the currency sign and the postal mark, for a total of 147 characters.
Numerals
Occupies part of row 3. The ten digits from "0" to "9".
Latin letters
Occupies part of row 3. The 26 letters of the English alphabet in uppercase and lowercase form for a total of 52.
Hiragana
Occupies row 4. Contains 48 unvoiced kana (including the obsolete wi and we), 20 voiced kana (dakuten), 5 semi-voiced kana (handakuten), 10 small kana for palatalized and assimilated sounds, for a total of 83 characters.
Katakana
Occupies row 5. There are 86 characters; in addition to the katakana equivalents of the hiragana characters, the small ka/ke kana (/) and the vu kana ().
Greek letters
Occupies row 6. The 24 letters of the Greek alphabet in uppercase and lowercase form (minus the final sigma) for a total of 48.
Cyrillic letters
Occupies row 7. The 33 letters of the Russian alphabet in uppercase and lowercase form for a total of 66.
Box-drawing characters
Occupies row 8. Thin segments, thick segments, and mixed thin and thick segments, 32 total.
Kanji
The 2965 characters of level 1 (第1水準, dai ichi suijun) from row 16 to row 47, and the 3390 characters of level 2 (第2水準, dai ni suijun) from row 48 to row 84 for a total of 6355.

Special characters, numerals, and Latin characters Edit

As for the special characters in the kanji set, some characters from the graphic character set of the International Reference Version (IRV) of ISO/IEC 646:1991 (equivalent to ASCII) are absent from JIS X 0208. There are the aforementioned four characters "QUOTATION MARK", "APOSTROPHE", "HYPHEN-MINUS", and "TILDE". The former three are split into different code points in the kanji set (Nishimura, 1978; JIS X 0221-1:2001 standard, Section 3.8.7). The "TILDE" of IRV has no corresponding character in the kanji set.

In the following table, the ISO/IEC 646:1991 IRV characters in question are compared with their multiple equivalents in JIS X 0208, except for the IRV character "TILDE", which is compared with the "WAVE DASH" of JIS X 0208. The entries under the "Symbol" columns utilize UCS/Unicode code points, so the specifics of display may differ.

The ASCII/IRV characters without exact JIS X 0208 equivalents were later assigned code points by JIS X 0213, these are also listed below, as are Microsoft's mapping of the four characters.

Non-strict correspondence between ISO/IEC 646:1991 IRV (ASCII) and JIS X 0208
ISO/IEC 646:1991 IRV JIS X 0208
Column/Line x0213[6] Microsoft Symbol Name Kuten Symbol Name
2/2 1-2-16 92-94[A]
115-24[B]
" QUOTATION MARK 1-15 ¨ DIAERESIS
1-40 LEFT DOUBLE QUOTATION MARK
1-41 RIGHT DOUBLE QUOTATION MARK
1-77 DOUBLE PRIME
2/7 1-2-15 92-93[A]
115-23[B]
' APOSTROPHE 1-13 ´ ACUTE ACCENT
1-38 LEFT SINGLE QUOTATION MARK
1-39 RIGHT SINGLE QUOTATION MARK
1-76 PRIME
2/13 1-2-17 1-61[C] - HYPHEN-MINUS 1-30 HYPHEN
1-61 MINUS SIGN
7/14 1-2-18 1-33[D] ~ TILDE (no corresponding character)
(no corresponding character) 1-33 WAVE DASH[D]
  1. ^ a b From "NEC selection of IBM extensions". Occupies a code point unallocated in JIS X 0208.
  2. ^ a b From "IBM extensions". Outside range of JIS X 0208, but encodable in Shift_JIS.
  3. ^ Microsoft treat the JIS minus sign as a fullwidth form of the hyphen-minus.
  4. ^ a b Wave Dash is sometimes treated as a fullwidth form of the tilde, e.g. by Microsoft (see Tilde § Unicode and Shift JIS encoding of wave dash). The ASCII / IRV tilde is an ambiguous code point which may appear either as a tilde accent mark (˜) or as a dash with the same curvature (∼), although the dash is more common due to the spacing accent having a separate code point in Windows-1252; there is no JIS X 0208 character for a tilde accent. Character 1-2-18 in JIS X 0213 is shown as a tilde accent in the code chart.[6]

This means that the kanji set is the most widespread non-upward-compatible character set in the world; it is counted as one of the weak points of this standard.

Even with the 90 special characters, numerals, and Latin letters the kanji set and the IRV set have in common, this standard does not follow the arrangement of ISO/IEC 646. These 90 characters are split between rows 1 (punctuation) and 3 (letters and numbers), although row 3 does follow ISO 646 arrangement for the 62 letters and numbers alone (e.g. 4/1 ("A") in ISO 646 becomes 2/3 4/1 (i.e. 3-33) in JIS X 0208).

As to the cause of how these numerals, Latin letters, and so forth in the kanji set are the "full-width alphanumeric characters" (全角英数字, zenkaku eisūji) and how the original implementation came forth with a differing interpretation compared to the IRV, it is thought that it is due to these incompatibilities.

Ever since the first standard, it has been possible to represent composites (合成, gōsei) such as encircled numbers, ligatures for measurement unit names, and Roman numerals;[10] they were not given independent kuten code points. Although individual companies that manufacture information systems can make an effort to represent these characters as customers may require by the composition of the characters, none has requested to have them added to the standard, instead choosing to proprietarily offer them as gaiji.

In the fourth standard (1997), all these characters were explicitly defined as characters that accompany an advancement of the current position; that is to say, they are spacing characters. Furthermore, it was ruled that they should not be made by the composition of characters. For this reason, it became disallowed to represent Latin characters with diacritics at all, with possibly the sole exception of the ångström symbol (Å) at row 2 cell 82.

Hiragana and katakana Edit

The hiragana and katakana in JIS X 0208, unlike JIS X 0201, includes dakuten and handakuten markings as part of a character. The katakana wi () and we () (both obsolete in modern Japanese) as well as the small wa (), not in JIS X 0201, are also included.

The arrangement of kana in JIS X 0208 is different from the arrangement of katakana in JIS X 0201. In JIS X 0201, the syllabary starts with wo (), followed by the small kana sorted by gojūon order, followed by the full-size kana, also in gojūon order (ヲァィゥェォャュョッーアイウエオ......ラリルレロワン). On the other hand, in JIS X 0208, the kana are sorted first by gojūon order, then in the order of "small kana, full-size kana, kana with dakuten, and kana with handakuten" such that the same fundamental kana is grouped with its derivatives (ぁあぃいぅうぇえぉお......っつづ......はばぱひびぴふぶぷへべぺほぼぽ......ゎわゐゑをん). This ordering was chosen in order to more simply facilitate the sorting of kana-based dictionary look-ups (Yasuoka, 2006).[k]

As mentioned above, in this standard, the previously defined katakana order in JIS X 0201 was not followed in JIS X 0208. It is thought that the JIS X 0201 katakana being "half-width kana" arose due to the incompatibility with the katakana of this standard. This point is also one of the weaknesses of this standard.

Kanji Edit

How the kanji in this standard were chosen from what sources, why they are split into level 1 and level 2, and how they are arranged are all explained in detail in the fourth standard (1997). Per that explanation, the kanji included in the following four kanji listings were reflected in the 6349 characters of the first standard (1978).

  • Kanji Listing for Standard Code (Tentative) (標準コード用漢字表 (試案), Hyōjun Kōdo-yō Kanjihyō (Shian))
The Information Processing Society of Japan kanji code committee compiled this list in 1971. In the below "Correspondence Analysis Results", this appears to be 6086 characters.
  • Basic Kanji for Administrative Data Processing Use (行政情報処理用基本漢字, Gyōsei Jōhō Shoriyō Kihon Kanji)
Selected by the Administrative Management Agency of Japan in 1975, it consists of 2817 characters. For data for the purpose of selection, the Agency made a report which, starting with the "Kanji Listing for Standard Code (Tentative)", contrasted several kanji listings, the "Correspondence Analysis Results and Frequency of Use of Kanji for Administrative Data Processing Use Normal Kanji Selection" (行政情報処理用標準漢字選定のための漢字の使用頻度および対応分析結果, Gyōsei Jōhō Shoriyō Kihon Kanji Sentei no Tame no Kanji no Shiyō Hindo Oyobi Taiō Bunseki Kekka), or "Correspondence Analysis Results" (対応分析結果, Taiō Bunseki Kekka) for short.
  • Japanese Personality Registration Name Kanji (日本生命収容人名漢字, Nihon Seimei Shūyō Jinmei Kanji)
One of the kanji listings that compose the "Correspondence Analysis Results", consisting of 3044 characters. It no longer exists. The original list was nonexistent for the original drafting committee; this kanji list was reflected in the standard to follow the "Correspondence Analysis Results".
  • Kanji for National Administrative District Listing (国土行政区画総覧使用漢字, Kokudo Gyōsei Kukaku Sōran Shiyō Kanji)
One of the kanji listings that compose the "Correspondence Analysis Results", consisting of 3251 characters. They are the kanji used in the list of all administrative place names compiled by the Japan Geographic Data Center, the "National Administrative District Listing" (国土行政区画総覧, Kokudo Gyōsei Kukaku Sōran). The original drafting committee did not investigate the listing itself; the kanji used from this list followed the "Correspondence Analysis Results".

In the second and third standards, they added four and two characters to level 2, respectively, bringing the total kanji to 6355. Also, in the second standard, character forms were changed as well as transposition among the levels; in the third standard as well, character forms were changed. These are described further below.

Level partitioning Edit

The 2,965 Level 1 kanji occupy rows 16 to 47. The 3,390 Level 2 kanji occupy rows 48 to 84.

For level 1, characters common to multiple kanji glyph listings were chosen, using the tōyō kanji, the tōyō kanji correction draft, and the jinmeiyō kanji as a basis. Also, JIS C 6260 ("To-Do-Fu-Ken (Prefecture) Identification Code"; currently JIS X 0401) and JIS C 6261 ("Identification code for cities, towns and villages"; currently JIS X 0402) were consulted; kanji for nearly all Japanese prefectures, cities, districts, wards, towns, villages, and so forth were intentionally placed in level 1.[l] Furthermore, amendments by experts were added.

Level 2 was dedicated to kanji that made an appearance in the aforementioned four major listings but were not selected for level 1. As noted below, the kanji of level 1 were ordered by their pronunciation, so among the kanji whose pronunciation were difficult to determine, there were those that were transferred from level 1 to level 2 on that basis (Nishimura, 1978).

Due to these decisions, for the most part, level 1 contains more frequently used kanji, and level 2 contains more infrequently used kanji, but of course, those were judged by the standards of the day; over the passage of time, some level 2 kanji have become more frequently used, such as one meaning "to soar" () and one meaning "to glitter" (); and inversely, some level 1 kanji have become infrequent, notably the ones meaning "centimeter" () and "millimeter" (). Of the current jōyō kanji, 30 fall into level 2,[m] while three are missing altogether (塡󠄀, 剝󠄀 and 頰󠄀).[n] Of the current jinmeiyō kanji, 192 are in level 2,[o] while 105 are not part of the standard.[p]

Arrangement Edit

The kanji in level 1 are sorted in order of each one's "representative reading" (i.e. a canonical reading chosen for the purposes of this standard only); the reading of a kanji for this may be an on or a kun reading; readings are sorted in gojūon order.[q] As a general rule, the on (Chinese-sound) reading is considered the representative reading; where a kanji has multiple on readings, the reading judged to be predominant in use frequency is used for the representative reading (JIS C 6226-1978 standard, Section 3.4). For the small percentage of kanji that either do not have an on reading or have an on reading which is little known and not in common use, the kun reading was employed as the representative reading. Where a verb kun reading must be used as the representative reading, the ren'yōkei (rather than the shūshikei) form is used.

For example, cells 1 to 41 on row 16 are 41 characters sorted as starting with a reading of a. Within these, 22 characters, including 16-10 (: on reading "ki"; kun reading "aoi") and 16-32 (: on readings "zoku" and "shoku"; kun reading "awa") are there on the basis of their kun readings. 16-09 (: on reading "", kun reading "a(i)") and 16-23 (: on readings "" and "kyū", kun reading "atsuka(i)") are just two examples of ren'yōkei-form verbs used for the representative reading.

Where the representative reading is the same between different kanji, a kanji that uses an on reading is placed ahead of one that uses a kun reading. Where the on or kun readings are the same between more than one kanji, they are then ordered by their primary radical and stroke count.

Whether on level 1 or level 2, itaiji are arranged to directly follow their exemplar form. For example, in level 2, right after row 49 cell 88 (), the immediately following characters deviate from the general rule (stroke count in this case) to include three variants of 49-88 (, , and ).[r]

The kanji in level 2 are arranged in order of primary radical and stroke count. Where these two properties are the same for different kanji, they are then sorted by reading.

Kanji from unknown sources Edit

Kanji for which sources are unclear, unknown, or otherwise un­iden­ti­fiable in JIS X 0208:1997 Appendix 7
Kuten Symbol Classi­fi­ca­tion
52-55 Unknown
52-63 Unknown
54-12 Source unclear
55-27 Un­iden­ti­fiable
57-43 Source unclear
58-83 Source unclear
59-91 Source unclear
60-57 Source unclear
74-12 Source unclear
74-57 Source unclear
79-64 Source unclear
81-50 Source unclear

It has been pointed out that there are kanji in the kanji set that are not found in comprehensive, unabridged kanji dictionaries, and that the sources thereof are unknown. For example, only one year after the first standard was established, Tajima (1979) reported that he had confirmed 63 kanji that were not to be found in Shinjigen (a large kanji dictionary published by Kadokawa Shoten), nor in Dai Kan-Wa jiten, and they did not make sense as ryakuji of any sort; he noted that it would be preferable for kanji not available in kanji dictionaries to be selected from definite sources. These kanji came to be known as "ghost" characters (幽霊文字, yūrei moji) or "ghost kanji" (幽霊漢字, yūrei kanji), among other names.

The drafting committee for the fourth version of the standard also saw the existence of kanji with sources unknown as a problem, and so made an inquiry into just what kind of sources the drafting committee of the first version referenced. As a result, it was discovered that the original drafting committee had heavily relied on the "Correspondence Analysis Results" to collect kanji. When the drafting committee investigated the "Correspondence Analysis Results", it became clear that many of the kanji included in the kanji set but not found in exhaustive kanji dictionaries supposedly came from the "Japanese Personality Registration Name Kanji" and "Kanji for National Administrative District Listing" lists mentioned in the "Correspondence Analysis Results".

It was confirmed that no original text for the "Japanese Personality Registration Name Kanji" referenced in the "Correspondence Analysis Results" exists. For the "National Administrative District Listing", Sasahara Hiroyuki of the fourth version's drafting committee examined the kanji that appeared on the in-progress development pages for the first standard. The committee also consulted many ancient writings, as well as many examples of personal names in a database of NTT phone books.

Due to this thorough investigation, the committee was able to pare down the number of kanji for which the source cannot be confidently explained to twelve, shown on the adjacent table. Of these, it is conjectured that several glyphs came about due to copying errors. In particular, 妛 was probably created when printers tried to create 𡚴 by cutting and pasting 山 and 女 together. A shadow from that process was misinterpreted as a line, resulting in 妛 (a picture of this can be found in the Jōyō kanji jiten).

Unification of kanji variants Edit

According to the specifications in the fourth standard (1997), unification (包摂, hōsetsu, not the same term used for Unicode's "unification" although it is nearly the same concept) is the action of giving the same code point to a character without regard to its different character forms. In the fourth standard, the glyphs allowed are limited; the extent to which particular allographic glyphs are unified into a graphemic code point is clearly defined.

Furthermore, according to the specifications in the standard, a glyph (字体, jitai, lit. "character body";) is an abstract notion as to the graphical representation of a graphic character; a character form (字形, jikei, lit. "character shape"; also a "glyph" in a sense, but differentiated on a different level for standardization purposes) is the representation as a graphical shape that a glyph takes in actuality (e.g. due to a glyph being handwritten, printed, displayed on a screen, etc.). For a single glyph, there exist an endless range of possible concretely and/or visibly different character forms. A variation between a character form of one glyph is termed a "design difference" (デザインの差, dezain no sa).

The extent to which a glyph is unified to one code point is determined according to that code point's "example glyph" (例示字体, reiji jitai) and the "unification criteria" (包摂規準, hōsetsu kijun) that can be applied to that example glyph; that is, the example glyph for a code point applies to that code point, and any glyphs for which the parts that compose the example glyph are replaced in accordance with the unification criteria also apply to that code point.

For example, the example glyph at 33-46 () is composed of radical 9 () and the kanji that eventually spawned the so kana (). Also, in unification criterion 101, there are three kanji displayed: the first takes the form most often seen in Japanese (); the second contains a more traditional form () in which the first two strokes form radical 12 (the kanji numeral for the number 8: ); and the third is like the second, except that radical 12 is inverted (). Consequently, all three permutations (, , ) all apply to the code point at line 33 cell 46.

In the fourth standard, including one of the errata for the first printing, there are 186 unification criteria.

When a code point's example glyph is composed of more than one part glyph, unification criteria can be applied to each part. After a unification criterion is applied to one part glyph, that part cannot have any more unification criteria applied to it. Also, a unification criterion is not allowed to apply if the resulting glyph would coincide with that of another code point entirely.

An example glyph is no more than an example for that code point; it is not a glyph "endorsed" by the standard. Also, the unification criteria need only be used for generally used kanji and for the purpose of assigning things to the code points of this standard. The standard requests that generally unused kanji not be created based on the example glyphs and unification criteria.

The kanji of the kanji set are not chosen completely consistently according to the unification criteria. For example, although 41-7 corresponds to the form where the third and fourth strokes cross () as well as the form where they don't () according to unification criterion 72, 20-73 only corresponds to the form where they do not cross (), and 80-90 only corresponds to the form where they do ().

The terms "unification", "unification criteria", and "example glyph" were adopted in the fourth standard. From the first to the third version, kanji and relations between kanji were grouped into three types: "independent" (独立, dokuritsu), "compatible" (対応, taiō), and "equivalent" (同値, dōchi); it was explained that the characters recognized as equivalent "consolidate to just one point". "Equivalence" included, other than kanji with exactly the same shape, kanji with differences due to style, and kanji where the difference in character form is small.

In the first standard, it was stipulated that "this standard ... does not establish the particulars of character forms" (Section 3.1); it also states that "the aim of this standard is to establish the general idea of characters and their codes; the design of their character forms and such lie outside its scope." In the second and third standards as well, notes to the effect that specific designs of character forms lie outside its scope (the note on item 1). The fourth standard also stipulates that "This standard regulates graphic characters as well as their bit patterns, and the use, specific designs of individual characters, and so forth are not within the scope of this standard" (JIS X 0208:1997, item 1).

Unification criteria for compatibility Edit

In the fourth standard, "unification criteria for maintaining compatibility with previous standards" (過去の規格との互換性を維持するための包摂規準, kako no kikaku to no gokansei wo iji suru tame no hōsetsu kijun) is defined. Their application is limited to 29 code points whose glyphs vary greatly between the standards JIS C 6226-1983 on and after and JIS C 6226-1978. For those 29 code points, the glyphs from JIS C 6226-1983 on and after are displayed as "A", and the glyphs from JIS C 6226-1978 as "B". On each of them, both "A" and "B" glyphs may be applied. However, in order to claim compatibility with the standard, whether the "A" or "B" form has been used for each code point must be explicitly noted.

Character encodings Edit

Encoding schemes stipulated by JIS X 0208 Edit

In JIS X 0208:1997, article 7 combined with appendices 1 and 2 define a total of eight encoding schemes.

In the descriptions below, the "CL" (control left), "GL" (graphic left), "CR" (control right), and "GR" (graphic right) regions are respectively, in column/line notation, from 0/0 to 1/15, from 2/1 to 7/14, from 8/0 to 9/15, and from 10/1 to 15/14. For each code, 2/0 is assigned the graphic character "SPACE" and 7/15 the control character "DELETE". The C0 control characters (defined in JIS X 0211 and matching ISO/IEC 6429) are assigned to the CL region.

7-bit encoding for kanji
Stipulated in the standard itself. The JIS X 0208 double-byte set is assigned to the GL region.
8-bit encoding for kanji
Stipulated in the standard itself. Same as the 7-bit encoding, but defined in terms of 8-bit bytes. The CR region may be unused, or encode the C1 control characters from JIS X 0211. The GR region is unused.
International Reference Version + 7-bit encoding for kanji
Stipulated in the standard itself. The shift in control character designates the ISO/IEC 646:1991 IRV (International Reference Version, equivalent to US-ASCII) to the GL region. Shift out designates the JIS X 0208 double-byte set to the same region.
Latin characters + 7-bit encoding for kanji
Stipulated in the standard itself. As with IRV+7-bit, but with ISO/IEC 646:IRV replaced with ISO/IEC 646:JP (the Roman set of JIS X 0201).
International Reference Version + 8-bit encoding for kanji
Stipulated in the standard itself. ISO/IEC 646:IRV is assigned to the GL region, JIS X 0208 to the GR region. This is effectively a subset of EUC-JP, excluding the half-width katakana from JIS X 0201 and the supplemental kanji from JIS X 0212.
Latin characters + 8-bit encoding for kanji
Stipulated in the standard itself. As with IRV+8-bit, but with ISO/IEC 646:IRV replaced with ISO/IEC 646:JP.
Shift-coded character set
Stipulated in Appendix 1: "Shift-Coded Representation" (シフト符号化表現, Shifuto Fugōka Hyōgen). The authoritative definition of Shift JIS.
RFC 1468-coded character set
Stipulated in Appendix 2: "RFC 1468-Coded Representation" (RFC 1468符号化表現, RFC 1468 Fugōka Hyōgen). Resembles ISO-2022-JP (which is authoritatively defined in RFC 1468) but is defined in terms of eight-bit bytes, whereas ISO-2022-JP is defined in terms of seven-bit bytes.

Among the encodings stipulated in the fourth standard, only the "Shift" coded character set is registered by the IANA.[11] However, certain others are closely related to IANA-registered encodings defined elsewhere (EUC-JP and ISO-2022-JP).

Escape sequences for JIS X 0202 / ISO 2022 Edit

JIS X 0208 may be used within ISO 2022/JIS X 0202 (of which ISO-2022-JP is a subset). The escape sequences to designate JIS X 0208 to each of the four ISO 2022 code sets are listed below. Here, "ESC" refers to the control character "Escape" (0x1B, or 1/11).

ISO 2022 escape sequences to select JIS C 6226 and JIS X 0208
Standard G0 G1 G2 G3
78 ESC 2/4 4/0 ESC 2/4 2/9 4/0 ESC 2/4 2/10 4/0 ESC 2/4 2/11 4/0
83 ESC 2/4 4/2 ESC 2/4 2/9 4/2 ESC 2/4 2/10 4/2 ESC 2/4 2/11 4/2
90 onward ESC 2/6 4/0 ESC 2/4 4/2 ESC 2/6 4/0 ESC 2/4 2/9 4/2 ESC 2/6 4/0 ESC 2/4 2/10 4/2 ESC 2/6 4/0 ESC 2/4 2/11 4/2

The escape sequence starting ESC 2/4 selects a multi-byte character set. The escape sequence starting ESC 2/6 specifies a revision of the upcoming character set selection. JIS C 6226:1978 is identified by the multibyte-94-set identifier byte 4/0 (corresponding to ASCII @). JIS C 6226:1983 / JIS X 0208:1983 is identified by the multibyte-94-set identifier byte 4/2 (B). JIS X 0208:1990 is also identified by the 94-set identifier byte 4/2, but can be distinguished with the revision identifier 4/0 (@).

Duplicate encodings of ASCII and JIS X 0201 Edit

When using the kanji set of this standard with either the ISO/IEC 646:1991 IRV graphic character set (ASCII) or JIS X 0201's graphic character set for Latin characters (JIS-Roman), the treatment of the characters common to both sets becomes problematic. Unless one takes special measures, the characters included in both sets do not all map to each other one-to-one, and a single character may be given more than one code point; that is, it may cause a duplicate encoding.

JIS X 0208:1997, in regards to when a character is common to both sets, basically forbids the use of the code point in the kanji set (which is one of two code points), eliminating duplicate encodings. It is judged that characters that have the same name are the same character.

For example, both the name of the character corresponding to the bit pattern 4/1 in ASCII and the name of the character corresponding to row 3 cell 33 of the kanji set are "LATIN CAPITAL LETTER A". In International Reference Version + 8-bit code for kanji, whether by the bit pattern 4/1 or by the bit pattern corresponding to the kanji set's row 3 cell 33 (10/3 12/1), the letter "A" (i.e. "LATIN CAPITAL LETTER A") is represented. The standard forbids the use of the "10/3 12/1" bit pattern, in an attempt to eliminate the duplicate encoding.

In consideration to implementations that treat the characters of the code points in the kanji set as "full-width characters" and those of ASCII or JIS-Roman as different characters, the use of the kanji set code points is permitted only for the sake of backwards compatibility. For example, for the purpose of backwards compatibility, it is permitted to consider 10/3 12/1 in International Reference Version + 8-bit code for kanji to correspond to a full-width "A".

If the kanji set is used along with ASCII or JIS-Roman, then even if the standard is abided by strictly, the unique encoding of a character is not guaranteed. For example, in the International Reference Version + 8-bit code for kanji, it is valid to represent a hyphen with the bit pattern 2/13 for the character "HYPHEN-MINUS", as well as with the kanji set's row 1 cell 30 (bit pattern 10/1 11/14) for the character "HYPHEN". In addition, the standard does not define which of the two to use for what, and so the hyphen is not given one unique encoding. The same problem affects the minus sign, the quotation marks, and so forth.

Moreover, even if the kanji set is used as a separate code, there is no guarantee that the unique encoding of characters is implemented. In many cases, however, the full-width "IDEOGRAPHIC SPACE" at row 1 cell 1 and the half-width space (2/0) coexist. How the two should be different is not self-explanatory, and is not specified in the standard.

Comparison of encoding schemes used in practice Edit

Encoding Alternate name 7-bit?[A] ISO 2022? State­less?[B] Accepts ASCII? 0x00–7F always ASCII? Superset of 8-bit JIS X 0201? Supports JIS X 0212? Bytewise self-synchron­izing? Bitwise self-synchron­izing?
ISO-2022-JP "JIS" (JIS X 0202) Yes Yes No[C] Yes Sequences can be non-ASCII[C] No (encoding possible)[D] Possible[E] No No
Shift_JIS "SJIS" No No Yes Almost[F] Isolated bytes can be non-ASCII[G] Yes No No No
EUC-JP "UJIS" (Unixized JIS) No Yes[H] Yes[H] Usually[I] Yes No (encoded)[J] Usually available[K] No No
Unicode formats for comparison[L]
UTF-8   No No Yes Yes Yes No (encoded) Available Yes Usually[M]
UTF-16 "Unicode"[N] No No Yes No No No (encoded) Available Over 16-bit words only. No
GB 18030   No No[O] Yes Yes Isolated bytes can be non-ASCII No (encoded) Available No No
UTF-32   No No Yes No No No (encoded) Available Usually, in practice[P] No
  1. ^ i.e. does not require 8-bit clean transmission.
  2. ^ i.e. the sequence used to encode a given character is always the same, no matter what the previous character(s) were. See state (computer science).
  3. ^ a b ISO-2022-JP is a stateful encoding: all charsets are encoded over 0x21–7E and are switched between using ANSI escapes. Hence, while it is ASCII in its initial state, entire sequences of non-ASCII characters can be encoded with ASCII bytes.
  4. ^ JIS X 0201 katakana are available in JIS X 0202 and ISO 2022, but not included in the basic ISO-2022-JP profile, although they are a common extension.
  5. ^ JIS X 0212 is available in JIS X 0202 and ISO 2022, and included in the ISO-2022-JP-1 and ISO-2022-JP-2 profiles, but not in the basic ISO-2022-JP profile.
  6. ^ Single byte characters 0x21–7E in Shift_JIS are properly ISO-646-JP, in order to be a superset of 8-bit JIS X 0201, but are often decoded (not necessarily displayed) as ASCII, which differs only in two places.
  7. ^ Some (not all) ASCII bytes can appear as second bytes, but not first bytes, of double-byte characters in Shift_JIS. Hence in a sequence of two or more ASCII bytes, the second byte onward are necessarily ASCII (or ISO-646-JP) characters.
  8. ^ a b Packed-format EUC is based on ISO 2022 mechanisms, with charset designations pre-arranged. Charset designation escapes and locking shifts are avoided, whereas use of single shifts can be implemented in a non-stateful manner. The constraints of ISO 2022 are nonetheless followed.
  9. ^ Single byte characters 0x21–7E in EUC-JP are generally considered ASCII, but sometimes treated as ISO-646-JP.
  10. ^ Unlike Shift_JIS, EUC-JP will not handle plain 8-bit JIS X 0201 input without prior conversion, due to the different representation of the JIS X 0201 katakana (with single-shifts).
  11. ^ JIS X 0212 in EUC-JP is not always implemented.
  12. ^ Besides the properties of the encodings themselves, Unicode formats have further advantages stemming from the underlying character set: they are not limited to JIS coded characters but can represent the entirety of UCS (including the full repertoire of JIS coded characters), and are hence suited to international use. They are also less badly affected by colliding proprietary extensions, due to their greater base repertoire and designated private use areas.
  13. ^ Most bitwise frameshifts of UTF-8-encoded text will produce invalid UTF-8, but it is possible to construct sequences of characters that remain valid UTF-8 even when frameshifted by one or more bits.
  14. ^ By Microsoft only.
  15. ^ While GB 18030 and GBK are extensions of the EUC-CN form of GB/T 2312, they do not follow the constraints of EUC or ISO 2022, unlike EUC-JP (or the original EUC-CN).
  16. ^ Although, in theory, UTF-32 is self-synchronizing over 32-bit dwords only, the use of a 32-bit value to represent a 21-bit value means that, in practice, UTF-32 contains a continuous run of at least 11 zero bits at the high end of each character, which can usually be used to align to character boundaries, depending on the codepoint(s) involved.

History Edit

Until five years have passed after a Japanese Industrial Standard has been established, reaffirmed, or revised, the prior standard undergoes a process of reaffirmation, revision, or withdrawal. Since establishment, the standard has been subject to revision three times, and at present, the fourth standard is valid.

First standard Edit

The first standard is JIS C 6226-1978 "Code of Japanese Graphic Character Set for Information Interchange" (情報交換用漢字符号系, Jōhō Kōkan'yō Kanji Fugōkei), established by the Japanese Minister of International Trade and Industry on 1 January 1978. It is also called 78JIS for short. Entrusted by the Agency of Industrial Science and Technology, a JIPDEC kanji code standardization research and study committee produced the draft. The committee chairman was Moriguchi Shigeichi.

The code included 453 non-Kanji (including Hiragana, Katakana, the Roman, Greek and Cyrillic alphabets and punctuation) and 6349 Kanji (2965 level 1 Kanji and 3384 level 2 Kanji) for a total of 6802 characters.[12] It did not yet include box-drawing characters. The standard itself was set in Shaken Co., Ltd's Ishii Mincho typeface.

Second standard Edit

The second standard JIS C 6226-1983 "Code of Japanese Graphic Character Set for Information Interchange" (情報交換用漢字符号系, Jōhō Kōkan'yō Kanji Fugōkei) revised the first standard on 1 September 1983. It is also called 83JIS. Entrusted by the AIST, a JIPDEC kanji code-related JIS committee produced the draft. The committee chairman was Motooka Tōru.

The draft of the second standard was based on the consideration of factors such as the promulgation of the jōyō kanji, the enforcement of the jinmeiyō kanji, and the standardization of Japanese-language Teletex by the Ministry of Posts and Telecommunications; also, the next modification was performed to keep pace with JIS C 6234-1983 (24-pixel matrix printer character forms; presently JIS X 9052).

Addition of special characters
39 characters were added to the special characters. Among these 39, per JICST recommendations, and from such standards as JIS Z 8201-1981 (mathematical symbols) and JIS Z 8202-1982 (quantity, unit, and chemical symbols), things that could not be represented by composition were chosen.
Newly added box-drawing characters
32 box-drawing characters were added.
Swapping of itaiji code points
Code points for 22 variant pairs of Kanji were swapped, such that the variant in level 2 was moved to level 1 and vice versa.[12][13] For example, (level 1's) row 36 cell 59 in the first standard () was moved to (level 2's) row 52 cell 68; the point originally at row 52 cell 68 () was in turn moved to row 36 cell 59.
Additions to the level 2 kanji
Three characters from level 1 and one character from level 2 were given new code points at previously unassigned code points in row 84 as level 2 kanji. Itaiji for each of those code points were newly assigned to their original locations.[14] For example, row 84 cell 1 in the second standard () was moved there to accommodate a different form not included in the first standard at row 22 cell 38 as a level 1 kanji ().
Modification of character forms
The character forms of approximately 300 kanji were amended.[15]

Among the changes in those 300 or so kanji character forms, many level 1 glyphs that were in the style of the Kangxi Dictionary were changed into variants, and especially more simplified forms (e.g. ryakuji and extended shinjitai). For example, a couple of code points that are often the subject of criticism due to being greatly changed are row 18 cell 10 (78JIS: , 83JIS: ) and row 38 cell 34 (78JIS: , 83JIS: ).

There were many smaller changes away from the Kangxi-style variants; for example, row 25 cell 84 () lost part of a stroke. Also, where some glyphs for level 1 kanji were not Kangxi-style forms, there were some changed into their Kangxi-style forms; for example, row 80 cell 49 () gained part of a stroke (i.e., the same part of the stroke that 25-84 lost).

In order to elucidate the original intent of the first standard, these ended up falling into parameters for unification criteria in the fourth standard. The difference in form for the examples noted above ("" and "") falls under the parameters for unification criterion 42 (concerning the component "").[s]

The bulk of the changes to character forms are differences between level 1 and level 2 kanji. Specifically, simplification was done more often for level 1 kanji than for level 2 kanji; simplifications applied to level 1 kanji (e.g. "" to "" and "" to "") were not generally applied to kanji in level 2 ("" stayed as-is). The aforementioned 25-84 () and 80-49 () were given different treatment likewise, as the former is in level 1 and the latter is in level 2. Even so, there were some changes regardless of the level; for instance characters containing the "door" () and "winter" () components were changed with no different treatment between level 1 and level 2 kanji.

However, for 29 code points (such as the problematic 18-10 and 38-34 mentioned above), the forms inherited by the fourth standard contradicts the original intent of the first. For these, there are special unification criteria to maintain compatibility with the previous standards at these code points.

When the new "X" category for Japanese Industrial Standards (for information-related fields) was introduced, the second standard was re-termed JIS X 0208-1983[12] on 1 March 1987.

Third standard Edit

The third standard JIS X 0208-1990 "Code of Japanese Graphic Character Set for Information Interchange" (情報交換用漢字符号, Jōhō Kōkan'yō Kanji Fugō) revised the second standard on 1 September 1990. It is also called 90JIS for short. Entrusted by the AIST, a committee at the Japanese Standards Association for the revision of JIS X 0208 created the draft. The committee chairman was Tajima Kazuo.

225 kanji glyphs were changed, and two characters were added to level 2 (84-05 "" and 84-06 ""). This was a disunification of itaiji for two characters already included (49-59 "" and 63-70 ""). Some of the changes and the two additions corresponded to the 118 jinmeiyō kanji added in March 1990.[12] The standard itself was set in Heisei Mincho.

Fourth standard Edit

The fourth standard JIS X 0208:1997 "7-bit and 8-bit double byte coded KANJI sets for information interchange" (7ビット及び8ビットの2バイト情報交換用符号化漢字集合, Nana-Bitto Oyobi Hachi-Bitto no Ni-Baito Jōhō Kōkan'yō Fugōka Kanji Shūgō) revised the third standard on 20 January 1997. It is also called 97JIS for short. Entrusted by the AIST, a JSA committee for research and study of coded character sets produced the draft. The committee chairman was Shibano Kōji.

The basic policies of this revision were to perform no changes the character set, to clarify ambiguous provisions, and to make the standard relatively easier to use. Addition, removal, and code point rearrangement were not done, and without exception, the example glyphs were also left unchanged. However, the stipulations of the standard were completely re-written and/or supplemented. Whereas the third standard was 65 pages long without the explanations, the fourth standard was 374 pages without the explanations.

The main points of the revision are:

Definition of encoding methods
Until the third standard, only the encoding method based on JIS X 0202 code extension was defined. This is something unusual as far as coded character sets go. In the fourth standard, encoding methods that do not use escape sequences for the purpose of code extension were defined.
Definition of the general prohibition of the use of unassigned code points and methods of usage for unassigned code points
The third standard, in an explanation that was not part of the standard, described things as if there were places where for some unassigned code points, it was acceptable to assign gaiji. In the fourth standard, it was clarified that use of unassigned code points is generally prohibited. Also, the conditions for the usage of unassigned code points were specified.
General elimination of duplicate encodings
Each character was given a "character name" that maps to those of other standards. Also, encoding methods to use them together with the ISO/IEC 646's International Reference Version or JIS X 0201 were specified. When JIS X 0208 is used together with either, among two assigned code points for characters with the same name, only one is permitted; thus, duplicate encodings were generally eliminated.
Investigation into sources of kanji
Characters included in the standard so far that are found in neither the Kangxi Dictionary nor the Dai Kanwa Jiten were identified. Accordingly, exactly with what purpose for inclusion and from which sources these kanji came during compilation of the first standard was investigated.
Definition of kanji unification criteria
Based on things such as the materials for the drafting of the first standard, an attempt was made to restore the intent of the first standard for the scope of the glyphs each code point represents. Moreover, the criteria for unifying kanji glyphs were clearly defined.
Inclusion of de facto standards
By the time of the fourth standard, the encoding methods Shift JIS and ISO-2022-JP had become de facto standards for personal computing and e-mail, respectively. These encoding methods were included as "Shift-Coded Representation" and "RFC 1468-Coded Representation" (described above).

Successors Edit

JIS X 0213 (extended kanji) was designed "with the goal being to offer a sufficient character set for the purposes of encoding the modern Japanese language that JIS X 0208 intended to be from the start";[16] it defines a character set that expands upon the kanji set of JIS X 0208. The drafters of JIS X 0213 recommend migration from JIS X 0208 to JIS X 0213, among the advantages being JIS X 0213's compatibility with the Hyōgai Kanji Glyph List and with newer jinmeiyō kanji.

Contrary to the expectations of the drafters, adoption of JIS X 0213 has been anything but fast since its enactment in the year 2000. The drafting committee of JIS X 0213:2004 wrote (in the year 2004), "The status where 'what the majority of information systems can use in common is JIS X 0208 only' still continues." (JIS X 0213:2000, Appendix 1:2004, section 2.9.7)

For Microsoft Windows, the predominant operating system (and hence supplying the predominant desktop environment) in the personal computing sector, the JIS X 0213 repertoire has been included since Windows Vista, released in November 2006. Mac OS X has been compatible with JIS X 0213 since version 10.1 (released in 2001). Many Unix-likes such as Linux can (optionally) support JIS X 0213 if desired. Therefore, it is thought that with time, JIS X 0213 support on personal computers will not be an impediment to its eventual adoption.

Among the drafters of JIS X 0213, there are those who expect to see a mix of JIS X 0208 and JIS X 0213 before any adoption of JIS X 0213 (Satō, 2004). However, JIS X 0208 continues to be used for the present, and many predict it to endure as a standard. There are barriers that need to be overcome if JIS X 0213 is to supplant JIS X 0208 in common usage:

  • The character repertoires utilized in Japanese mobile phones at the present time[when?] are based on JIS X 0208. There are no officially announced plans whatsoever to migrate these to JIS X 0213 compatibility. As mobile phones are now a pervasive aspect of Japanese textual communication (see Japanese mobile phone culture), being a widespread, commonly accessed medium for sending e-mail and accessing the World Wide Web, a lack of adoption for mobile phones deters usage elsewhere.
  • JIS X 0213 is not strictly upward-compatible with JIS X 0208 in terms of unification criteria (see below). For large-scale archives (e.g. bibliographic databases and Aozora Bunko) that use JIS X 0208 and follow its unification criteria strictly, it is thought that it would be extremely difficult work to both convert all the data to JIS X 0213 and preserve the same standard of textual integrity.
  • In practice, many systems define and use unassigned code points in JIS X 0208. For example, Windows assigns IBM and NEC extended characters and user-defined character areas (see Windows-932), and mobile phones assign emoji in some such places. The code points of these gaiji conflict with the code points that JIS X 0213 codes use, so there would be some difficulty in migrating these systems from JIS X 0208 to JIS X 0213. There are also plans to migrate to UCS/Unicode and use the JIS X 0213 repertoire from there, but until a system administrator is able to judge that the implementations of UCS/Unicode surrogate pairs and character compositions are sufficiently stable, he or she is likely to hesitate to use the repertoire of JIS X 0213 that requires those implementations.
  • The improvements provided by JIS X 0213 are mostly in the realm of characters that are not used as often as the ones already present in JIS X 0208. Because there are nearly twice as many glyphs that need to be implemented for less usage of those extra glyphs, it can be a low return on investment in many cases, especially where resources are constrained.

Implementations Edit

Because JIS X 0208 / JIS C 6226 is primarily a character set and not a strictly defined character encoding, several companies have implemented their own encodings of the character set.

Several of these incorporate vendor-specific character assignments in place of unallocated regions of the standard. These include Windows-932 and MacJapanese, as well as NEC's PC98 character encoding. While IBM-932 and IBM-942 also include vendor assignments, they include them outside of the region used for JIS X 0208.

Relation to other standards Edit

ISO/IEC 646 IRV and ASCII Edit

As noted above, the kanji set is not upwardly compatible with the ISO/IEC 646:1991 IRV (ASCII) graphic character set. The kanji set and the IRV graphic character set can be used together as specified in JIS X 0208 (IRV + 7-bit code for kanji and IRV + 8-bit code for kanji). They can be used together in EUC-JP as well.

JIS X 0201 Edit

The kanji set lacks three characters included in JIS X 0201's graphic character set for Latin characters: 2/2 (QUOTATION MARK), 2/7 (APOSTROPHE), and 2/13 (HYPHEN-MINUS). The kanji set contains all character included in JIS X 0201's graphic character set for katakana.

The kanji set and the graphic character set for Latin characters can be used together as specified in JIS X 0208 (Latin characters + 7-bit code for kanji and the Latin characters + 8-bit code for kanji). The kanji set, graphic character set for Latin characters, and JIS X 0201's graphic character set for katakana can be used together as specified in JIS X 0208 (the shift-coded character set; i.e. Shift JIS). The kanji set and graphic character set for katakana can be used together in EUC-JP.

JIS X 0212 Edit

JIS X 0212 (supplementary kanji) defines additional characters with code points for the purposes of information processing that requires characters not found in JIS X 0208. Rather than allocating characters within the main JIS X 0208 kanji set, it defines a second 94-by-94 kanji set containing supplementary characters.

JIS X 0212 can be used with JIS X 0208 in EUC-JP. Also, JIS X 0208 and JIS X 0212 are both source standards for UCS/Unicode's Han unification, meaning that kanji from both sets can be included in one Unicode-format document.

Among the code points that the second version of JIS X 0208 changed, 28 code points in JIS X 0212 reflect the character forms from before the changes.[17] Also, JIS X 0212 reassigns the "closure mark" that JIS X 0208 had assigned as a non-kanji (, at row 1 cell 26) as a kanji (, at row 16 cell 17). JIS X 0212 has no characters in common with JIS X 0208 other than these. Hence, it is not suited for general use on its own.

However, in the fourth version of JIS X 0208, the connection to JIS X 0212 was not defined at all. It is believed that this is because the drafting committee of the fourth JIS X 0208 standard had a critical opinion of the selection and identification methods of JIS X 0212.[18] The character meanings and selection rationales were not properly documented, making it difficult to identify whether desired kanji corresponded to those in its repertoire.[19] The text of the fourth standard, as well as pointing out the problematic points of the character selection of JIS X 0212, states that "it is thought that not only is character selection impossible, it is also impossible to use together; the connection to JIS X 0212 is not defined at all." (section 3.3.1)

JIS X 0213 Edit

 
Euler diagram comparing repertoires of JIS X 0208, JIS X 0212, JIS X 0213, Windows-31J, the Microsoft standard repertoire and Unicode.

JIS X 0213 (extension kanji) defines a kanji set that expands upon the kanji set of JIS X 0208. According to this standard, it is "designed with the goal being to offer a sufficient character set for the purposes of encoding the modern Japanese language that JIS X 0208 intended to be from the start."[16]

The kanji set of JIS X 0213 incorporates all characters that can be represented in the kanji set of JIS X 0208, with many additions. In total, JIS X 0213 defines 1183 non-kanji and 10,050 kanji (for a total of 11,233 characters), within two 94-by-94 planes (, men). The first plane (non-kanji and level 1–3 kanji) is based on JIS X 0208, whereas the second plane (level 4 kanji) is designed to fit within the unallocated rows of JIS X 0212, allowing use in EUC-JP.[20] JIS X 0213 also defines Shift_JISx0213, a variant of Shift_JIS capable of encoding the entirety of JIS X 0213.

For most intents and purposes, JIS X 0213 plane 1 is a superset of JIS X 0208. However, different unification criteria are applied to some code points in JIS X 0213 compared to JIS X 0208. Consequently, some pairs of kanji glyphs that were represented by one JIS X 0208 code point, due to being unified, are given separate code points in JIS X 0213. For example, the glyph at row 33 cell 46 of JIS X 0208 ("", described above) unifies a few variants due to its right-hand component. In JIS X 0213, two forms (the ones containing the component "") are unified on plane 1 row 33 cell 46, and the other (containing the component "") is located at plane 1 row 14 cell 41. Therefore, whether JIS X 0208 row 33 cell 46 should be mapped to JIS X 0213 plane 1 row 33 cell 46 or plane 1 row 14 cell 41 cannot be determined automatically.[t] This limits the extent to which JIS X 0213 can be considered upwardly compatible with JIS X 0208, as admitted by the JIS X 0213 drafting committee.[21]

However, for the most part, row m cell n in JIS X 0208 corresponds to plane 1 row m cell n in JIS X 0213; therefore, not much confusion arises in practice. This is because most typefaces have come to use the glyphs exemplified in JIS X 0208, and most users are not consciously aware of the unification criteria.

ISO/IEC 10646 and Unicode Edit

The kanji set of JIS X 0208 is among the original source standards for the Han unification in ISO/IEC 10646 (UCS) and Unicode. Every kanji in JIS X 0208 corresponds to its own code point in UCS/Unicode's Basic Multilingual Plane (BMP).

The non-kanji in JIS X 0208 also correspond to their own code points in the BMP. However, for some special characters, some systems implement a different correspondences from those of UCS/Unicode's (which are based on the character names given JIS X 0208:1997).

Footnotes Edit

Explanatory Edit

  1. ^ a b c d (Withdrawn)
  2. ^ JIS and Apple: U+2014.
    Unicode,[a] Microsoft and WHATWG: U+2015.
  3. ^ Microsoft and WHATWG: U+FF5E.
    Unicode,[a] JIS and Apple: U+301C.
  4. ^ Microsoft and WHATWG: U+2225.
    Unicode,[a] JIS and Apple: U+2016.
  5. ^ Microsoft: U+FF0D.
    Unicode,[a] JIS and Apple: U+2212.
    WHATWG: U+FF0D on decoding, exceptionally both on encoding.
  6. ^ a b c d Added in JIS X 0213
  7. ^ Absent in original version of extension, which predates the Heisei era. Code position selected by either NEC or Microsoft.[5] Not in Macintosh PostScript.
  8. ^ a b c d e f g h i Duplicated by additions made to row 2 in 1983. Not encoded here (but left unallocated) in JIS X 0213,[5] but duplicate-encoded here by Microsoft and WHATWG. As for the Macintosh PostScript encoding, a Private Use U+F87F is appended to the form decoded with the macOS library functions to allow round-tripping.
  9. ^ As shown in the code tables registered at the International Register of Coded Character Sets To Be Used With Escape Sequences, prior to the fourth standard (1997), the ku () and ten () were called "section" and "position" respectively in English. As to the background of the change in the English, in the JIS X 0221-1995 (UCS) standard that translated ISO/IEC 10646-1:1993, "group", "plane", "row", and "cell" can be translated into gun (), men (), ku (), and ten (). However, the row and cell of JIS X 0208 and the row and cell of the UCS are different ideas.
  10. ^ Character names are given in Roman letters and are used internationally, so they can be considered an international convention, somewhat like the scientific names of living organisms. In regard to this analogy, the Japanese common names for the characters would be like using common names for organisms.
  11. ^ For a fully featured kana-order search or sort, word readings, repetition marks, and so forth must be taken into account. The sorting of Japanese character strings is prescribed in JIS X 4061 (Collation of Japanese character strings).
  12. ^ According to Yasuoka (2001a), it seems there were some accidental oversights. He notes, for example, that the ba (, 58-57) of Inba and the shi (, 61-89) of Shisui, Kumamoto are not part of level 1.
  13. ^ List: 丼󠄀傲󠄀刹󠄀哺󠄀喩󠄀嗅󠄀嘲󠄁毀󠄀彙󠄀恣󠄀惧󠄀慄󠄀憬󠄀拉󠄀摯󠄁曖󠄀楷󠄀鬱󠄀璧󠄀瘍󠄀箋󠄀籠󠄀緻󠄀羞󠄀訃󠄀諧󠄀貪󠄀踪󠄀辣󠄀錮
  14. ^ The jōyō kanji 𠮟󠄀 is included only in its official variant form 叱.
  15. ^ List: 乘󠄀亞󠄀佛󠄀侑󠄀來󠄀俐󠄀傳󠄀僞󠄀價󠄀儉󠄀兒󠄀凉󠄀凛󠄀凰󠄀剩󠄀劍󠄀勁󠄀勳󠄀卷󠄀單󠄀嚴󠄀圈󠄀國󠄀圓󠄀團󠄀壞󠄀壘󠄀壯󠄀壽󠄀奎󠄀奧󠄀奬󠄀孃󠄀實󠄀寢󠄀將󠄀專󠄀峽󠄀崚󠄀巖󠄀巫󠄀已󠄀帶󠄀廣󠄀廳󠄀彈󠄀彌󠄀彗󠄀從󠄀徠󠄀恆󠄀惡󠄀惠󠄀惺󠄀愼󠄀應󠄀懷󠄀戰󠄀戲󠄀拔󠄁拜󠄀拂󠄀搜󠄀搖󠄀攝󠄀收󠄀敍󠄀昊󠄀昴󠄀晏󠄀晄󠄀晝󠄀晨󠄀晟󠄀暉󠄀曉󠄀檜󠄀栞󠄀條󠄀梛󠄀椰󠄀榮󠄀樂󠄀樣󠄀橙󠄀檢󠄀櫂󠄀櫻󠄀盜󠄀毬󠄀氣󠄀洸󠄀洵󠄀淨󠄀渾󠄀滉󠄀漱󠄀滯󠄀澁󠄀澪󠄀濕󠄀煌󠄀燒󠄀燎󠄀燿󠄀爭󠄀爲󠄀狹󠄀默󠄀獸󠄀珈󠄀珀󠄀琥󠄀瑶󠄀疊󠄀皓󠄀盡󠄀眞󠄁眸󠄀碎󠄀祕󠄀祿󠄀禪󠄀禮󠄀稟󠄀稻󠄀穗󠄀穰󠄀穹󠄀笙󠄀粹󠄀絆󠄀綺󠄀綸󠄀縣󠄀縱󠄀纖󠄀羚󠄀翔󠄀飜󠄀聽󠄀脩󠄀臟󠄀與󠄀苺󠄀茉󠄀莊󠄀莉󠄀菫󠄀萠󠄀萬󠄀蕾󠄀藏󠄀藝󠄀藥󠄀衞󠄀裝󠄀覽󠄀詢󠄀諄󠄀謠󠄀讓󠄀賣󠄀赳󠄀轉󠄀迪󠄀逞󠄀醉󠄀釀󠄀釉󠄀鎭󠄀鑄󠄀陷󠄀險󠄀雜󠄀靜󠄀頌󠄀顯󠄀颯󠄀騷󠄀驍󠄀驗󠄀髮󠄀鷄󠄀麒󠄀黎󠄀齊󠄀堯󠄀槇󠄀遙󠄀凜󠄀熙
  16. ^ List: 焰󠄀鷗󠄀俠󠄀繫󠄀繡󠄀渚󠄀蔣󠄀醬󠄀蟬󠄀琢󠄀簞󠄀摑󠄀顚󠄀禱󠄀萊󠄀蠟󠄀增󠄀德󠄀橫󠄀瀨󠄀猪󠄀神󠄀祥󠄀福󠄁綠󠄀緖󠄀薰󠄀諸󠄀賴󠄀郞󠄀都󠄀黑󠄀逸󠄁謁󠄀緣󠄀黃󠄀溫󠄀禍󠄀悔󠄀海󠄀渴󠄀漢󠄁器󠄁祈󠄀虛󠄀響󠄁勤󠄁謹󠄀揭󠄀擊󠄀穀󠄀祉󠄁視󠄁煮󠄀社󠄁者󠄁臭󠄁祝󠄀暑󠄁署󠄀涉󠄀狀󠄀節󠄁祖󠄁僧󠄁層󠄁巢󠄀憎󠄀贈󠄁卽󠄀嘆󠄀著󠄁徵󠄀禎󠄁突󠄁難󠄀梅󠄀繁󠄁晚󠄀卑󠄀碑󠄀賓󠄀敏󠄀侮󠄁勉󠄀步󠄀墨󠄀每󠄀祐󠄀欄󠄀虜󠄀淚󠄀類󠄀曆󠄀歷󠄀練󠄀鍊󠄀錄󠄀俱󠄀瘦󠄀吞󠄀寬󠄀廊󠄁朗󠄀懲
  17. ^ For row 19 cells 30 and 31, the order is mixed up for their representative readings. Consequently, where the correct order should be kaeru (, "frog") followed by kaori (, "aroma"), their positions are transposed so that kaori precedes kaeru.
  18. ^ In addition, the primarily used variant () is at row 23 cell 85 on level 1, and one other variant () can be found grouped as having the "gold" radical at row 78 cell 63 on level 2.
  19. ^ The question of which glyphs within the unification criteria are to be used is left to the type designer. Depending on that (and the end-user's circumstances), it is possible that neither, both, one, or the other of these two will follow their Kangxi-style form.
  20. ^ This is the same uncertainty as to whether the "HYPHEN-MINUS" in ISO/IEC 646 should be mapped to "HYPHEN" or "MINUS SIGN" in JIS X 0208.

Reference footnotes Edit

  1. ^ "Why Japan didn't create the iPod". Gatunka. 5 May 2008.
  2. ^ JIS X 0208 was not one of the standards included in the announced by the Ministry of Economy, Trade and Industry on 17 January 2007.
  3. ^ a b c Steele, Shawn (15 April 1998). "CP932.TXT: cp932 to Unicode table". Microsoft. (codes in Shift_JIS format; SJIS 0x815C = 1-29 = JIS 0x213D; SJIS 0x817C = 1-61 = JIS 0x215D)
  4. ^ a b "Map (external version) from Mac OS Japanese encoding to Unicode 2.1 and later". Apple. (codes in Shift_JIS format; SJIS 0x815C = 1-29 = JIS 0x213D; SJIS 0x817C = 1-61 = JIS 0x215D)
  5. ^ a b c d Lunde, Ken (21 March 2019). "A Brief History of Japan's Era Name Ligatures". CJK Type Blog. Adobe Inc.
  6. ^ a b c Japanese Industrial Standard Committee. ISO-IR-233: Japanese Graphic Character Set for Information Interchange, Plane 1 (Update of ISO-IR 228) (PDF). ITSCJ/IPSJ.
  7. ^ Unicode, Inc. (14 October 2011). "JIS X 0208 (1990) to Unicode".
  8. ^ van Kesteren, Anne, "Index jis0208", Encoding Standard, WHATWG
  9. ^ a b Jungshik Shin (14 October 2011). "KSX1001.TXT: KS X 1001 to Unicode table". Unicode, Inc.
  10. ^ JIS C 6225-1979 (control character codes for the purpose of the Japanese graphic character set for information interchange) provided control characters for the beginning and end of composition. JIS C 6225 was re-termed JIS X 0207 in 1987, and was withdrawn in 1997.
  11. ^ In the IANA character sets, Shift JIS is defined by referring to JIS X 0208:1997 Appendix 1.
  12. ^ a b c d "15. History of JIS X 0208" (PDF), IBM Japanese Graphic Character Set for Extended UNIX Code (EUC), IBM, p. 371, archived (PDF) from the original on 8 December 2017, retrieved 8 December 2017
  13. ^ Lunde, Ken. "Appendix Q § 78-vs-83-3". CJKV Information Processing (supplementary material). O'Reilly. Note inclusion of kuten codes with hyphen omitted.
  14. ^ Lunde, Ken. "Appendix Q § 78-vs-83-2". CJKV Information Processing (supplementary material). O'Reilly. Note inclusion of kuten codes with hyphen omitted.
  15. ^ According to Nomura (1984), the number of character forms changed, including moves between code points, is 294. According to Shibano (1997a) and the text of the fourth standard, the number is of character forms changed is 300.
  16. ^ a b Original Japanese: 「JIS X 0208が当初符号化を意図していた現代日本語を符号化するために十分な文字集合を提供することを目的として設計された」
  17. ^ Lunde, Ken. "Appendix Q § TJ2". CJKV Information Processing (supplementary material). O'Reilly. Note inclusion of kuten codes with hyphen omitted.
  18. ^ For example, Shibano Kōji (1997a), who served as the chairman of the drafting committee for the fourth standard, stated these about the selection method: "It is based on a superficial understanding of JIS X 0208's character set selection; it is a mistaken understanding" (original Japanese: 「JIS X 0208の文字集合選定の表層的理解に基づくものであり、間違った理解である」) and "There is a big problem in investigating all of a character set that exceeds 10000 characters." (original Japanese: 「1万字を越える水準の文字集合の検討としては、大きな問題がある」)
  19. ^ Marukawa, Kazushi. . Archived from the original on 22 May 2005.
  20. ^ Chang, Hyeshik (31 October 2021). "Readme for CJKCodecs". cPython. Python Software Foundation.
  21. ^ JIS X 0213:2000 section 5.3.2, JIS X 0213:2000 Appendix 1:2004 section 3.2.2

See also Edit

  • JIS coded character sets
    • JIS X 0201 "7-bit and 8-bit coded character sets for information interchange"
    • JIS X 0202 "Information technology – Character code structure and extension techniques" (ISO/IEC 2022)
    • JIS X 0208 "7-bit and 8-bit double byte coded KANJI sets for information interchange"
    • JIS X 0211 "Control functions for coded character sets" (ISO/IEC 6429)
    • JIS X 0212 "Code of the supplementary Japanese graphic character set for information interchange"
    • JIS X 0213 "7-bit and 8-bit double byte coded extended KANJI sets for information interchange"
    • JIS X 0221 "Universal Multiple-Octet Coded Character Set (UCS)" (ISO/IEC 10646)
  • Extended shinjitai
  • Help:Japanese

References Edit

For the purposes of citation, these Japanese names are presented as if they were in Western order where Romanized, and retain Eastern order where not.

  • Nishimura, Hirohiko [西村 恕彦], 1978. The Kanji JIS [漢字のJIS]. Standardization Journal [標準化ジャーナル], 171: 3–8.
  • Nomura, Masaaki [野村 雅昭], 1984. Revision of JIS C 6226: Kanji codes for information interchange [JIS C 6226 情報交換用漢字符号系の改正]. Standardization Journal [標準化ジャーナル], 14 (3): 4–9.
  • Ogata, Katsuhiro [小形 克宏], 2006a. [permanent dead link] Things that were not unified in 97JIS among the example glyphs changed in JIS C 6226-1983 (83JIS) [JIS C 6226-1983 (83JIS) で例示字体を変更したうち、97JISで包摂とされなかったもの][permanent dead link] (accessed 29 January 2007).
  • Ogata, Katsuhiro [小形 克宏], 2006b. [permanent dead link] Things that fell within the scope of unification among the example glyphs changed in JIS C 6226-1983 (83JIS) [JIS C 6226-1983 (83JIS) 例示字体変更のうち、包摂の範囲内だったもの][permanent dead link] (accessed 29 January 2007).
  • Satō, Takayuki [佐藤 敬幸], 2004. Concerning the revision of JIS X 0213 (7-bit and 8-bit double byte coded extended Kanji sets for information interchange) [JIS X 0213 (7ビット及び8ビットの2バイト情報交換用符号化拡張漢字集合) の改正について]. Standardization Journal [標準化ジャーナル], 34 (4): 8–12.
  • Shibano, Kōji [芝野 耕司], 1997a. Concerning the revision of JIS X 0208 (7-bit and 8-bit double byte coded Kanji sets for information interchange ) [JIS X0208 (7ビット及び8ビットの2バイト情報交換用符号化漢字集合) の改正について]. Standardization Journal [標準化ジャーナル], 27 (3): 8–12.
  • Shibano, Kōji [芝野 耕司], 1997b. Plan for the extension of the JIS kanji [JIS漢字の拡張計画]. Standardization Journal [標準化ジャーナル], 27 (7): 5–11.
  • Shibano, Kōji [芝野 耕司], 2000. Establishment of JIS X 0213 (7-bit and 8-bit double byte coded extended Kanji sets for information interchange) [JIS X 0213 (7ビット及び8ビットの2バイト情報交換用符号化拡張漢字集合) の制定]. Standardization Journal [標準化ジャーナル], 30 (3): 3–7.
  • Shibano, Kōji [芝野 耕司], 2001. Concerning JIS kanji [漢字について]. Standardization and Quality Control [標準化と品質管理], 54 (8): 44–50.
  • Shibano, Kōji [芝野 耕司] (editor), 2002. JIS Kanji Dictionary, enlarged and revised edition [増補改訂 JIS漢字字典]. Tokyo: Japanese Standards Association (ISBN 4-542-20129-5).
  • Shibano, Kōji [芝野 耕司], 2002. The development of kanji and Japanese language processing technologies: the standardization of kanji codes [漢字・日本語処理技術の発展: 漢字コードの標準化]. IPSJ Magazine [情報処理], 43 (12): 1362–1367
  • Tajima, Kazuo [田嶋 一夫], 1979. Problems concerning the use of the JIS kanji listing: design and handling of kanji in kanji processing systems [JIS漢字表の利用上の問題: 漢字処理システムにおける漢字のデザインと管理]. Journal of Information Processing Society of Japan [情報管理], 21 (10): 753–761.
  • Uchida, Tomio [内田 富雄], 1990. Establishment of JIS X 0212 (Kanji Codes for Information Interchange – Supplemental Kanji) [JIS X 0212 (情報交換用漢字符号―補助漢字) の制定]. Standardization Journal [標準化ジャーナル], 20 (11): 6–11.
  • Yasuoka, Kōichi [安岡 孝一], 2001a. Situation of the Newest Character Codes in Japan (former part) [日本における最新文字コード事情 (前編)]. Systems, Control and Information [システム/制御/情報], 45 (9): 528–535.
  • Yasuoka, Kōichi [安岡 孝一], 2001b. Situation of the Newest Character Codes in Japan (latter part) [日本における最新文字コード事情 (後編)]. Systems, Control and Information [システム/制御/情報], 45 (12): 687–694.
  • Yasuoka, Kōichi [安岡 孝一], 2006 "Differences between the JIS kanji plan (1976) and JIS C 6226-1978" [JIS漢字案 (1976) とJIS C 6226-1978の異同] at the 17th "Computer Usage for Oriental Studies" [東洋学へのコンピュータ利用] research seminar. 3–51.
  • Yasuoka, Kōichi [安岡 孝一] & Motoko Yasuoka [安岡 素子], 2006. The History of Character Codes: Europe, America, and Japan [文字符号の歴史: 欧米と日本編]. Tokyo: Kyōritsu Shuppan (ISBN 4-32012102-3).

External links Edit

  • The International Register that the IPSJ/ITSCJ supervises.
    • Japanese Character Set JIS C 6226-1978
    • Japanese Character Set JIS C 6226-1983
    • Update Registration 87 Japanese Graphic Character Set for Information Interchange
  • (in Japanese) (the latest standard may be read here).
  • (in Japanese) : (a copy of the latest standard may be purchased here).
  • (in Japanese) Unification-related provisions in the JIS X 0208 and 0213 standards
  • (in Japanese) Cyber Librarian – JIS kanji listing

0208, this, article, includes, list, general, references, lacks, sufficient, corresponding, inline, citations, please, help, improve, this, article, introducing, more, precise, citations, december, 2017, learn, when, remove, this, template, message, byte, char. This article includes a list of general references but it lacks sufficient corresponding inline citations Please help to improve this article by introducing more precise citations December 2017 Learn how and when to remove this template message JIS X 0208 is a 2 byte character set specified as a Japanese Industrial Standard containing 6879 graphic characters suitable for writing text place names personal names and so forth in the Japanese language The official title of the current standard is 7 bit and 8 bit double byte coded KANJI sets for information interchange 7ビット及び8ビットの2バイト情報交換用符号化漢字集合 Nana Bitto Oyobi Hachi Bitto no Ni Baito Jōhō Kōkan yō Fugōka Kanji Shugō It was originally established as JIS C 6226 in 1978 and has been revised in 1983 1990 and 1997 It is also called Code page 952 by IBM The 1978 version is also called Code page 955 by IBM JIS X 0208Alias es JIS C 6226Language s Japanese English Russian BulgarianPartial support Greek ChineseStandardJIS X 0208 1978 through 1997ClassificationISO 2022 DBCS CJK encodingExtensionsARIB STD B24 Kanji NEC PC98 DBCSEncoding formatsShift JIS SJIS ISO 2022 JP JIS EUC JP UJIS Preceded byJIS X 0201Succeeded byJIS X 0213Other related encoding s KS X 1001 GB 2312 JIS X 0212vte Contents 1 Scope of use and compatibility 2 Code charts 2 1 Lead byte 2 2 Non Kanji rows 2 2 1 Character set 0x21 row number 1 special characters 2 2 2 Character set 0x22 row number 2 special characters 2 2 3 Character set 0x23 row number 3 digits and Roman 2 2 4 Character set 0x24 row number 4 Hiragana 2 2 5 Character set 0x25 row number 5 Katakana 2 2 6 Character set 0x26 row number 6 Greek 2 2 7 Character set 0x27 row number 7 Cyrillic 2 2 8 Character set 0x28 row number 8 box drawing 2 2 9 Extension character set 0x2D row number 13 NEC special characters 2 3 Kanji rows 3 Code structure 3 1 Single byte codes 3 2 Code points and code numbers 3 3 Unassigned code points 3 4 Character names 4 Kanji set 4 1 Overview 4 2 Special characters numerals and Latin characters 4 3 Hiragana and katakana 4 4 Kanji 4 4 1 Level partitioning 4 4 2 Arrangement 4 4 3 Kanji from unknown sources 4 4 4 Unification of kanji variants 4 4 5 Unification criteria for compatibility 5 Character encodings 5 1 Encoding schemes stipulated by JIS X 0208 5 2 Escape sequences for JIS X 0202 ISO 2022 5 3 Duplicate encodings of ASCII and JIS X 0201 5 4 Comparison of encoding schemes used in practice 6 History 6 1 First standard 6 2 Second standard 6 3 Third standard 6 4 Fourth standard 6 5 Successors 7 Implementations 8 Relation to other standards 8 1 ISO IEC 646 IRV and ASCII 8 2 JIS X 0201 8 3 JIS X 0212 8 4 JIS X 0213 8 5 ISO IEC 10646 and Unicode 9 Footnotes 9 1 Explanatory 9 2 Reference footnotes 10 See also 11 References 12 External linksScope of use and compatibility EditThe character set JIS X 0208 establishes is primarily for the purpose of information interchange 情報交換 jōhō kōkan between data processing systems and the devices connected to them or mutually between data communication systems This character set can be used for data processing and text processing Partial implementations of the character set are not considered compatible Because there are places where such things have happened as the original drafting committee of the first standard taking care to separate characters between level 1 and level 2 and the second standard then shuffling some variant characters 異体字 itaiji between the levels at least in the first and second standards it is conjectured that non kanji and level 1 only implementation Japanese computer systems were at one time considered for development However such implementations have never been specified as compatible though examples such as the early NEC PC 9801 did exist 1 Even though there are provisions in the JIS X 0208 1997 standard concerning compatibility at the present time it is generally considered that this standard neither certifies compatibility nor is it an official manufacturing standard that amounts to a declaration of self compatibility 2 Consequently de facto JIS X 0208 compatible products are not considered to exist Terminology such as conformant 準拠 junkyo and support 対応 taiō is included in JIS X 0208 but the semantics of these terms vary from person to person Code charts EditLead byte Edit The first encoding byte corresponds to the row or cell number plus 0x20 or 32 in decimal see below Hence the code set starting with 0x21 has a row number of 1 and its cell 1 has a continuation byte of 0x21 or 33 and so forth For lead bytes used for characters other than kanji links are provided to charts on this page listing the characters encoded under that lead byte For lead bytes used for kanji links are provided to the appropriate section of Wiktionary s kanji index JIS X 0208 lead bytes 0 1 2 3 4 5 6 7 8 9 A B C D E F2x SP 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 3x 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 4x 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 5x 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 6x 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 7x 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 DELNon Kanji rows Edit Character set 0x21 row number 1 special characters Edit Some vendors use slightly different Unicode mapping for this set than the one below For example Microsoft maps kuten 1 29 JIS 0x213D to U 2015 Horizontal Bar 3 whereas Apple maps it to U 2014 Em Dash 4 Similarly Microsoft maps kuten 1 61 JIS 0x215D to U FF0D 3 the fullwidth form of U 002D Hyphen Minus and Apple maps it to U 2212 Minus Sign 4 Unicode mapping of the wave dash also differs between vendors See the cells with footnotes below ASCII and JISCII punctuation shown here with a yellow background may use alternative mappings to the Halfwidth and Fullwidth Forms block if used in an encoding which combines JIS X 0208 with ASCII or with JIS X 0201 such as Shift JIS EUC JP or ISO 2022 JP JIS X 0208 prefixed with 0x21 0 1 2 3 4 5 6 7 8 9 A B C D E F2x IDSP 3x ヽ ヾ ゝ ゞ 仝 々 〆 ー b 4x c d 5x e 6x lt gt 7x amp Character set 0x22 row number 2 special characters Edit Most of the characters in this set were added in 1983 except for characters 0x2221 0x222E kuten 2 1 through 2 14 or the first line of the chart below which were included in the original 1978 version of the standard JIS X 0208 prefixed with 0x22 0 1 2 3 4 5 6 7 8 9 A B C D E F2x 3x 4x 5x 6x 7x Å Character set 0x23 row number 3 digits and Roman Edit This set includes a subset of the ISO 646 invariant set and therefore also a subset of both ASCII and the JIS X 0201 Roman set minus punctuation and symbols comprising western Arabic numerals and both cases of the Basic Latin alphabet Characters in this set may use alternative Unicode mappings to the Halfwidth and Fullwidth Forms block if used in an encoding which combines JIS X 0208 with ASCII or with JIS X 0201 such as EUC JP Shift JIS or ISO 2022 JP Compare row 3 of KPS 9566 which this row exactly matches Compare and contrast row 3 of KS X 1001 and of GB 2312 which include their entire national variants of ISO 646 in this row rather than only the alphanumeric subset JIS X 0208 prefixed with 0x23 0 1 2 3 4 5 6 7 8 9 A B C D E F2x3x 0 1 2 3 4 5 6 7 8 94x A B C D E F G H I J K L M N O5x P Q R S T U V W X Y Z6x a b c d e f g h i j k l m n o7x p q r s t u v w x y zCharacter set 0x24 row number 4 Hiragana Edit This row contains Japanese Hiragana Compare row 4 of GB 2312 which matches this row Compare and contrast row 10 of KPS 9566 and of KS X 1001 which use the same layout but in a different row JIS X 0208 prefixed with 0x24 0 1 2 3 4 5 6 7 8 9 A B C D E F2x ぁ あ ぃ い ぅ う ぇ え ぉ お か が き ぎ く3x ぐ け げ こ ご さ ざ し じ す ず せ ぜ そ ぞ た4x だ ち ぢ っ つ づ て で と ど な に ぬ ね の は5x ば ぱ ひ び ぴ ふ ぶ ぷ へ べ ぺ ほ ぼ ぽ ま み6x む め も ゃ や ゅ ゆ ょ よ ら り る れ ろ ゎ わ7x ゐ ゑ を んCharacter set 0x25 row number 5 Katakana Edit This row contains Japanese Katakana Compare row 5 of GB 2312 which matches this row Compare and contrast row 11 of KPS 9566 and of KS X 1001 which use the same layout but in a different row Contrast the considerably different Katakana layout used by JIS X 0201 JIS X 0208 prefixed with 0x25 0 1 2 3 4 5 6 7 8 9 A B C D E F2x ァ ア ィ イ ゥ ウ ェ エ ォ オ カ ガ キ ギ ク3x グ ケ ゲ コ ゴ サ ザ シ ジ ス ズ セ ゼ ソ ゾ タ4x ダ チ ヂ ッ ツ ヅ テ デ ト ド ナ ニ ヌ ネ ノ ハ5x バ パ ヒ ビ ピ フ ブ プ ヘ ベ ペ ホ ボ ポ マ ミ6x ム メ モ ャ ヤ ュ ユ ョ ヨ ラ リ ル レ ロ ヮ ワ7x ヰ ヱ ヲ ン ヴ ヵ ヶCharacter set 0x26 row number 6 Greek Edit This row contains basic support for the modern Greek alphabet without diacritics or the final sigma Compare row 6 of GB 2312 and GB 12345 and row 6 of KPS 9566 which include the same Greek letters in the same layout although GB 12345 adds vertical presentation forms and KPS 9566 adds Roman numerals Compare and contrast row 5 of KS X 1001 which offsets the Greek letters to include the Roman numerals first JIS X 0208 prefixed with 0x26 0 1 2 3 4 5 6 7 8 9 A B C D E F2x A B G D E Z H 8 I K L M N 3 O3x P R S T Y F X PS W4x a b g d e z h 8 i k l m n 3 o5x p r s t y f x ps w6x7xCharacter set 0x27 row number 7 Cyrillic Edit This row contains the modern Russian alphabet and is not necessarily sufficient for representing other forms of the Cyrillic script Compare row 7 of GB 2312 which matches this row Compare and contrast row 12 of KS X 1001 and row 5 of KPS 9566 which use the same layout but in a different row JIS X 0208 prefixed with 0x27 0 1 2 3 4 5 6 7 8 9 A B C D E F2x A B V G D E Yo Zh Z I J K L M N3x O P R S T U F H C Ch Sh Sh Y E4x Yu Ya5x a b v g d e yo zh z i j k l m n6x o p r s t u f h c ch sh sh y e7x yu yaCharacter set 0x28 row number 8 box drawing Edit All characters in this set were added in 1983 and were not present in the original 1978 revision of the standard JIS X 0208 prefixed with 0x28 0 1 2 3 4 5 6 7 8 9 A B C D E F2x 3x 4x 5x6x7xExtension character set 0x2D row number 13 NEC special characters Edit Rows 9 through 15 of the JIS X 0208 standard are left empty However the following layout for row 13 first introduced by NEC is a common extension It is used with minor variations noted in footnotes by Windows 932 3 which is matched by the WHATWG Encoding Standard used by HTML5 by the PostScript variant but since KanjiTalk version 7 not the regular variant 5 of MacJapanese and by JIS X 0213 the successor to JIS X 0208 5 6 Unlike the other extensions made by Windows 932 WHATWG and JIS X 0213 the two match rather than colliding so decoding of most of this row is better supported than the other extensions made by JIS X 0213 NEC Special Characters for JIS X 0208 prefixed by 0x2D 0 1 2 3 4 5 6 7 8 9 A B C D E F2x 3x f 4x 5x f g 6x 7x h h h h h h h h h f f Kanji rows Edit See Appendix Japanese kanji by JIS X 0208 kuten code on Wiktionary Code structure EditIn order to represent code points column line numbers are used for one byte codes and kuten numbers are used for two byte codes For a way to identify a character without depending on a code character names are used Single byte codes Edit Almost all JIS X 0208 graphic character codes are represented with two bytes of at least seven bits each However every control character as well as the plain space although not the ideographic space is represented with a one byte code In order to represent the bit combination ビット組合せ bitto kumiawase of a one byte code two decimal numbers a column number and a line number are used Three high order bits out of seven or four high order bits out of eight counting from zero to seven or from zero to fifteen respectively form the column number Four low order bits counting from zero to fifteen form the line number Each decimal number corresponds to one hexadecimal digit For example the bit combination corresponding to the graphic character space is 010 0000 as a 7 bit number and 0010 0000 as an 8 bit number In column line notation this is represented as 2 0 Other representations of the same single byte code include 0x20 as hexadecimal or 32 as a single decimal number Code points and code numbers Edit The double byte codes are laid out in 94 numbered groups each called a row 区 ku lit section Every row contains 94 numbered codes each called a cell 点 ten lit point i This makes a total of 8836 94 94 possible code points although not all are assigned see below these are laid out in the standard in a 94 line 94 column code table A row number and a cell number each numbered from 1 to 94 for a standard JIS X 0208 code form a kuten 区点 point which is used to represent double byte code points A code number or kuten number 区点番号 kuten bangō is expressed in the form row cell the row and cell numbers being separated by a hyphen For example the character 亜 has a code point at row 16 cell 1 so its code number is represented as 16 01 In 7 bit JIS X 0208 as might be switched to in JIS X 0202 ISO 2022 JP both bytes must be from the 94 byte range of 0x21 used for row or cell number 1 through 0x7E used for row or cell number 94 exactly corresponding to the range used for 7 bit ASCII printing characters not counting the space Accordingly the encoded bytes are obtained by adding 0x20 32 to each number 7 For instance the above example of 16 01 亜 would be represented by the bytes 0x30 0x21 The 8 bit EUC JP instead uses the range 0xA1 through 0xFE setting the high bit to 1 whereas other encodings such as Shift JIS use more complicated transforms Shift JIS includes more encoding space than is needed for JIS X 0208 itself some Shift JIS specific extensions to JIS X 0208 make use of row numbers above 94 8 This structure is also used in the Mainland Chinese GB 2312 where it is natively known as 区位 quwei and the South Korean KS C 5601 currently KS X 1001 where the ku and ten are respectively known as hang 9 행 行 haeng and yol 9 열 列 yeol The later JIS X 0213 extends this structure by having more than one plane 面 men lit face of rows which is also the structure used by CNS 11643 and related to the structure used by CCCII Unassigned code points Edit Among the 2 byte codes rows 9 to 15 and 85 to 94 are unassigned code points 空き領域 aki ryōiki that is they are code points with no characters assigned to them Also some cells in other rows are also essentially unassigned code points These empty areas contain code points that should basically not be used Except when there is prior agreement among the relevant parties characters gaiji for information interchange should not be assigned to the unassigned code points Even when assigning characters to unassigned code points graphic characters defined in the standard should not be assigned to them and the same character should not be assigned to multiple unassigned code points characters should not be duplicated in the set Furthermore when assigning characters to unassigned code points it is necessary to be cautious of unification in regards to kanji glyphs For example row 25 cell 66 corresponds to the kanji meaning high or expensive both the form with a component resembling the mouth character 口 in the middle 高 and the less common form with a ladder like construction in the same location 髙 are subsumed into the same code point Consequently limiting point 25 66 to the mouth form and assigning the latter ladder form to an unassigned code point would technically be in violation of the standard In practice however several vendor specific Shift JIS variants including Windows 932 and MacJapanese encode vendor extensions in unallocated rows of the encoding space for JIS X 0208 Also most of the codes unassigned in JIS X 0208 are assigned by the newer JIS X 0213 standard Character names Edit See also Unicode character property Name Each JIS X 0208 character is given a name By using a character s name it is possible to identify characters without relying on their codes The names of characters are coordinated with other character set standards notably the Universal Coded Character Set UCS Unicode so this is one possible source of character mappings to character sets such as Unicode For example both the character at ISO IEC 646 International Reference Version US ASCII column 4 line 1 and the one at JIS X 0208 row 3 cell 33 have the name LATIN CAPITAL LETTER A Therefore the character at 4 1 in ASCII and the character at 3 33 in JIS X 0208 can be regarded as the same character although in practice alternative mapping is used for the JIS X 0208 character due to encodings providing ASCII separately Conversely ASCII characters 2 2 quotation mark 2 7 apostrophe 2 13 hyphen minus and 7 14 tilde can be determined to be characters that do not exist in this standard Character names of non kanji characters use uppercase Roman letters spaces and hyphens Non kanji characters are given a Japanese language common name 日本語通用名称 Nihongo tsuyō meishō but some provisions for these names do not exist j The names of kanji on the other hand are mechanically set according to the corresponding hexadecimal representation of their code in UCS Unicode The name of a kanji can be arrived at by prepending the Unicode codepoint with CJK UNIFIED IDEOGRAPH For example row 16 cell 1 亜 corresponds to U 4E9C in UCS so the name of it would be CJK UNIFIED IDEOGRAPH 4E9C Kanji are not given Japanese common names Kanji set EditOverview Edit JIS X 0208 prescribes a set of 6879 graphical characters that correspond to two byte codes with either seven or eight bits to the byte in JIS X 0208 this is called the kanji set 漢字集合 kanji shugō which includes 6355 kanji as well as 524 non kanji 非漢字 hikanji including characters such as Latin letters kana and so forth Special characters Occupies rows 1 and 2 There are 18 descriptor symbols 記述記号 kijutsu kigō such as the ideographic space and the Japanese comma and period eight diacritical marks such as dakuten and handakuten 10 characters for things that follow kana or kanji 仮名又は漢字に準じるもの kana mata wa kanji ni junjiru mono such as the Iteration mark 22 bracket symbols 括弧記号 kakko kigō 45 mathematical symbols 学術記号 gakujutsu kigō and 32 unit symbols which includes the currency sign and the postal mark for a total of 147 characters Numerals Occupies part of row 3 The ten digits from 0 to 9 Latin letters Occupies part of row 3 The 26 letters of the English alphabet in uppercase and lowercase form for a total of 52 Hiragana Occupies row 4 Contains 48 unvoiced kana including the obsolete wi and we 20 voiced kana dakuten 5 semi voiced kana handakuten 10 small kana for palatalized and assimilated sounds for a total of 83 characters Katakana Occupies row 5 There are 86 characters in addition to the katakana equivalents of the hiragana characters the small ka ke kana ヵ ヶ and the vu kana ヴ Greek letters Occupies row 6 The 24 letters of the Greek alphabet in uppercase and lowercase form minus the final sigma for a total of 48 Cyrillic letters Occupies row 7 The 33 letters of the Russian alphabet in uppercase and lowercase form for a total of 66 Box drawing characters Occupies row 8 Thin segments thick segments and mixed thin and thick segments 32 total Kanji The 2965 characters of level 1 第1水準 dai ichi suijun from row 16 to row 47 and the 3390 characters of level 2 第2水準 dai ni suijun from row 48 to row 84 for a total of 6355 Special characters numerals and Latin characters Edit As for the special characters in the kanji set some characters from the graphic character set of the International Reference Version IRV of ISO IEC 646 1991 equivalent to ASCII are absent from JIS X 0208 There are the aforementioned four characters QUOTATION MARK APOSTROPHE HYPHEN MINUS and TILDE The former three are split into different code points in the kanji set Nishimura 1978 JIS X 0221 1 2001 standard Section 3 8 7 The TILDE of IRV has no corresponding character in the kanji set In the following table the ISO IEC 646 1991 IRV characters in question are compared with their multiple equivalents in JIS X 0208 except for the IRV character TILDE which is compared with the WAVE DASH of JIS X 0208 The entries under the Symbol columns utilize UCS Unicode code points so the specifics of display may differ The ASCII IRV characters without exact JIS X 0208 equivalents were later assigned code points by JIS X 0213 these are also listed below as are Microsoft s mapping of the four characters Non strict correspondence between ISO IEC 646 1991 IRV ASCII and JIS X 0208 ISO IEC 646 1991 IRV JIS X 0208Column Line x0213 6 Microsoft Symbol Name Kuten Symbol Name2 2 1 2 16 92 94 A 115 24 B QUOTATION MARK 1 15 DIAERESIS1 40 LEFT DOUBLE QUOTATION MARK1 41 RIGHT DOUBLE QUOTATION MARK1 77 DOUBLE PRIME2 7 1 2 15 92 93 A 115 23 B APOSTROPHE 1 13 ACUTE ACCENT1 38 LEFT SINGLE QUOTATION MARK1 39 RIGHT SINGLE QUOTATION MARK1 76 PRIME2 13 1 2 17 1 61 C HYPHEN MINUS 1 30 HYPHEN1 61 MINUS SIGN7 14 1 2 18 1 33 D TILDE no corresponding character no corresponding character 1 33 WAVE DASH D a b From NEC selection of IBM extensions Occupies a code point unallocated in JIS X 0208 a b From IBM extensions Outside range of JIS X 0208 but encodable in Shift JIS Microsoft treat the JIS minus sign as a fullwidth form of the hyphen minus a b Wave Dash is sometimes treated as a fullwidth form of the tilde e g by Microsoft see Tilde Unicode and Shift JIS encoding of wave dash The ASCII IRV tilde is an ambiguous code point which may appear either as a tilde accent mark or as a dash with the same curvature although the dash is more common due to the spacing accent having a separate code point in Windows 1252 there is no JIS X 0208 character for a tilde accent Character 1 2 18 in JIS X 0213 is shown as a tilde accent in the code chart 6 This means that the kanji set is the most widespread non upward compatible character set in the world it is counted as one of the weak points of this standard Even with the 90 special characters numerals and Latin letters the kanji set and the IRV set have in common this standard does not follow the arrangement of ISO IEC 646 These 90 characters are split between rows 1 punctuation and 3 letters and numbers although row 3 does follow ISO 646 arrangement for the 62 letters and numbers alone e g 4 1 A in ISO 646 becomes 2 3 4 1 i e 3 33 in JIS X 0208 As to the cause of how these numerals Latin letters and so forth in the kanji set are the full width alphanumeric characters 全角英数字 zenkaku eisuji and how the original implementation came forth with a differing interpretation compared to the IRV it is thought that it is due to these incompatibilities Ever since the first standard it has been possible to represent composites 合成 gōsei such as encircled numbers ligatures for measurement unit names and Roman numerals 10 they were not given independent kuten code points Although individual companies that manufacture information systems can make an effort to represent these characters as customers may require by the composition of the characters none has requested to have them added to the standard instead choosing to proprietarily offer them as gaiji In the fourth standard 1997 all these characters were explicitly defined as characters that accompany an advancement of the current position that is to say they are spacing characters Furthermore it was ruled that they should not be made by the composition of characters For this reason it became disallowed to represent Latin characters with diacritics at all with possibly the sole exception of the angstrom symbol A at row 2 cell 82 Hiragana and katakana Edit The hiragana and katakana in JIS X 0208 unlike JIS X 0201 includes dakuten and handakuten markings as part of a character The katakana wi ヰ and we ヱ both obsolete in modern Japanese as well as the small wa ヮ not in JIS X 0201 are also included The arrangement of kana in JIS X 0208 is different from the arrangement of katakana in JIS X 0201 In JIS X 0201 the syllabary starts with wo ヲ followed by the small kana sorted by gojuon order followed by the full size kana also in gojuon order ヲァィゥェォャュョッーアイウエオ ラリルレロワン On the other hand in JIS X 0208 the kana are sorted first by gojuon order then in the order of small kana full size kana kana with dakuten and kana with handakuten such that the same fundamental kana is grouped with its derivatives ぁあぃいぅうぇえぉお っつづ はばぱひびぴふぶぷへべぺほぼぽ ゎわゐゑをん This ordering was chosen in order to more simply facilitate the sorting of kana based dictionary look ups Yasuoka 2006 k As mentioned above in this standard the previously defined katakana order in JIS X 0201 was not followed in JIS X 0208 It is thought that the JIS X 0201 katakana being half width kana arose due to the incompatibility with the katakana of this standard This point is also one of the weaknesses of this standard Kanji Edit How the kanji in this standard were chosen from what sources why they are split into level 1 and level 2 and how they are arranged are all explained in detail in the fourth standard 1997 Per that explanation the kanji included in the following four kanji listings were reflected in the 6349 characters of the first standard 1978 Kanji Listing for Standard Code Tentative 標準コード用漢字表 試案 Hyōjun Kōdo yō Kanjihyō Shian The Information Processing Society of Japan kanji code committee compiled this list in 1971 In the below Correspondence Analysis Results this appears to be 6086 characters Basic Kanji for Administrative Data Processing Use 行政情報処理用基本漢字 Gyōsei Jōhō Shoriyō Kihon Kanji Selected by the Administrative Management Agency of Japan in 1975 it consists of 2817 characters For data for the purpose of selection the Agency made a report which starting with the Kanji Listing for Standard Code Tentative contrasted several kanji listings the Correspondence Analysis Results and Frequency of Use of Kanji for Administrative Data Processing Use Normal Kanji Selection 行政情報処理用標準漢字選定のための漢字の使用頻度および対応分析結果 Gyōsei Jōhō Shoriyō Kihon Kanji Sentei no Tame no Kanji no Shiyō Hindo Oyobi Taiō Bunseki Kekka or Correspondence Analysis Results 対応分析結果 Taiō Bunseki Kekka for short Japanese Personality Registration Name Kanji 日本生命収容人名漢字 Nihon Seimei Shuyō Jinmei Kanji One of the kanji listings that compose the Correspondence Analysis Results consisting of 3044 characters It no longer exists The original list was nonexistent for the original drafting committee this kanji list was reflected in the standard to follow the Correspondence Analysis Results Kanji for National Administrative District Listing 国土行政区画総覧使用漢字 Kokudo Gyōsei Kukaku Sōran Shiyō Kanji One of the kanji listings that compose the Correspondence Analysis Results consisting of 3251 characters They are the kanji used in the list of all administrative place names compiled by the Japan Geographic Data Center the National Administrative District Listing 国土行政区画総覧 Kokudo Gyōsei Kukaku Sōran The original drafting committee did not investigate the listing itself the kanji used from this list followed the Correspondence Analysis Results In the second and third standards they added four and two characters to level 2 respectively bringing the total kanji to 6355 Also in the second standard character forms were changed as well as transposition among the levels in the third standard as well character forms were changed These are described further below Level partitioning Edit The 2 965 Level 1 kanji occupy rows 16 to 47 The 3 390 Level 2 kanji occupy rows 48 to 84 For level 1 characters common to multiple kanji glyph listings were chosen using the tōyō kanji the tōyō kanji correction draft and the jinmeiyō kanji as a basis Also JIS C 6260 To Do Fu Ken Prefecture Identification Code currently JIS X 0401 and JIS C 6261 Identification code for cities towns and villages currently JIS X 0402 were consulted kanji for nearly all Japanese prefectures cities districts wards towns villages and so forth were intentionally placed in level 1 l Furthermore amendments by experts were added Level 2 was dedicated to kanji that made an appearance in the aforementioned four major listings but were not selected for level 1 As noted below the kanji of level 1 were ordered by their pronunciation so among the kanji whose pronunciation were difficult to determine there were those that were transferred from level 1 to level 2 on that basis Nishimura 1978 Due to these decisions for the most part level 1 contains more frequently used kanji and level 2 contains more infrequently used kanji but of course those were judged by the standards of the day over the passage of time some level 2 kanji have become more frequently used such as one meaning to soar 翔 and one meaning to glitter 煌 and inversely some level 1 kanji have become infrequent notably the ones meaning centimeter 糎 and millimeter 粍 Of the current jōyō kanji 30 fall into level 2 m while three are missing altogether 塡 剝 and 頰 n Of the current jinmeiyō kanji 192 are in level 2 o while 105 are not part of the standard p Arrangement Edit The kanji in level 1 are sorted in order of each one s representative reading i e a canonical reading chosen for the purposes of this standard only the reading of a kanji for this may be an on or a kun reading readings are sorted in gojuon order q As a general rule the on Chinese sound reading is considered the representative reading where a kanji has multiple on readings the reading judged to be predominant in use frequency is used for the representative reading JIS C 6226 1978 standard Section 3 4 For the small percentage of kanji that either do not have an on reading or have an on reading which is little known and not in common use the kun reading was employed as the representative reading Where a verb kun reading must be used as the representative reading the ren yōkei rather than the shushikei form is used For example cells 1 to 41 on row 16 are 41 characters sorted as starting with a reading of a Within these 22 characters including 16 10 葵 on reading ki kun reading aoi and 16 32 粟 on readings zoku and shoku kun reading awa are there on the basis of their kun readings 16 09 逢 on reading hō kun reading a i and 16 23 扱 on readings sō and kyu kun reading atsuka i are just two examples of ren yōkei form verbs used for the representative reading Where the representative reading is the same between different kanji a kanji that uses an on reading is placed ahead of one that uses a kun reading Where the on or kun readings are the same between more than one kanji they are then ordered by their primary radical and stroke count Whether on level 1 or level 2 itaiji are arranged to directly follow their exemplar form For example in level 2 right after row 49 cell 88 劍 the immediately following characters deviate from the general rule stroke count in this case to include three variants of 49 88 劔 劒 and 剱 r The kanji in level 2 are arranged in order of primary radical and stroke count Where these two properties are the same for different kanji they are then sorted by reading Kanji from unknown sources Edit Kanji for which sources are unclear unknown or otherwise un iden ti fiable in JIS X 0208 1997 Appendix 7 Kuten Symbol Classi fi ca tion52 55 墸 Unknown52 63 壥 Unknown54 12 妛 Source unclear55 27 彁 Un iden ti fiable57 43 挧 Source unclear58 83 暃 Source unclear59 91 椢 Source unclear60 57 槞 Source unclear74 12 蟐 Source unclear74 57 袮 Source unclear79 64 閠 Source unclear81 50 駲 Source unclearIt has been pointed out that there are kanji in the kanji set that are not found in comprehensive unabridged kanji dictionaries and that the sources thereof are unknown For example only one year after the first standard was established Tajima 1979 reported that he had confirmed 63 kanji that were not to be found in Shinjigen a large kanji dictionary published by Kadokawa Shoten nor in Dai Kan Wa jiten and they did not make sense as ryakuji of any sort he noted that it would be preferable for kanji not available in kanji dictionaries to be selected from definite sources These kanji came to be known as ghost characters 幽霊文字 yurei moji or ghost kanji 幽霊漢字 yurei kanji among other names The drafting committee for the fourth version of the standard also saw the existence of kanji with sources unknown as a problem and so made an inquiry into just what kind of sources the drafting committee of the first version referenced As a result it was discovered that the original drafting committee had heavily relied on the Correspondence Analysis Results to collect kanji When the drafting committee investigated the Correspondence Analysis Results it became clear that many of the kanji included in the kanji set but not found in exhaustive kanji dictionaries supposedly came from the Japanese Personality Registration Name Kanji and Kanji for National Administrative District Listing lists mentioned in the Correspondence Analysis Results It was confirmed that no original text for the Japanese Personality Registration Name Kanji referenced in the Correspondence Analysis Results exists For the National Administrative District Listing Sasahara Hiroyuki of the fourth version s drafting committee examined the kanji that appeared on the in progress development pages for the first standard The committee also consulted many ancient writings as well as many examples of personal names in a database of NTT phone books Due to this thorough investigation the committee was able to pare down the number of kanji for which the source cannot be confidently explained to twelve shown on the adjacent table Of these it is conjectured that several glyphs came about due to copying errors In particular 妛 was probably created when printers tried to create 𡚴 by cutting and pasting 山 and 女 together A shadow from that process was misinterpreted as a line resulting in 妛 a picture of this can be found in the Jōyō kanji jiten Unification of kanji variants Edit According to the specifications in the fourth standard 1997 unification 包摂 hōsetsu not the same term used for Unicode s unification although it is nearly the same concept is the action of giving the same code point to a character without regard to its different character forms In the fourth standard the glyphs allowed are limited the extent to which particular allographic glyphs are unified into a graphemic code point is clearly defined Furthermore according to the specifications in the standard a glyph 字体 jitai lit character body is an abstract notion as to the graphical representation of a graphic character a character form 字形 jikei lit character shape also a glyph in a sense but differentiated on a different level for standardization purposes is the representation as a graphical shape that a glyph takes in actuality e g due to a glyph being handwritten printed displayed on a screen etc For a single glyph there exist an endless range of possible concretely and or visibly different character forms A variation between a character form of one glyph is termed a design difference デザインの差 dezain no sa The extent to which a glyph is unified to one code point is determined according to that code point s example glyph 例示字体 reiji jitai and the unification criteria 包摂規準 hōsetsu kijun that can be applied to that example glyph that is the example glyph for a code point applies to that code point and any glyphs for which the parts that compose the example glyph are replaced in accordance with the unification criteria also apply to that code point For example the example glyph at 33 46 僧 is composed of radical 9 亻 and the kanji that eventually spawned the so kana 曽 Also in unification criterion 101 there are three kanji displayed the first takes the form most often seen in Japanese 曽 the second contains a more traditional form 曾 in which the first two strokes form radical 12 the kanji numeral for the number 8 八 and the third is like the second except that radical 12 is inverted 曾 Consequently all three permutations 僧 僧 僧 all apply to the code point at line 33 cell 46 In the fourth standard including one of the errata for the first printing there are 186 unification criteria When a code point s example glyph is composed of more than one part glyph unification criteria can be applied to each part After a unification criterion is applied to one part glyph that part cannot have any more unification criteria applied to it Also a unification criterion is not allowed to apply if the resulting glyph would coincide with that of another code point entirely An example glyph is no more than an example for that code point it is not a glyph endorsed by the standard Also the unification criteria need only be used for generally used kanji and for the purpose of assigning things to the code points of this standard The standard requests that generally unused kanji not be created based on the example glyphs and unification criteria The kanji of the kanji set are not chosen completely consistently according to the unification criteria For example although 41 7 corresponds to the form where the third and fourth strokes cross 彥 as well as the form where they don t 彦 according to unification criterion 72 20 73 only corresponds to the form where they do not cross 顔 and 80 90 only corresponds to the form where they do 顏 The terms unification unification criteria and example glyph were adopted in the fourth standard From the first to the third version kanji and relations between kanji were grouped into three types independent 独立 dokuritsu compatible 対応 taiō and equivalent 同値 dōchi it was explained that the characters recognized as equivalent consolidate to just one point Equivalence included other than kanji with exactly the same shape kanji with differences due to style and kanji where the difference in character form is small In the first standard it was stipulated that this standard does not establish the particulars of character forms Section 3 1 it also states that the aim of this standard is to establish the general idea of characters and their codes the design of their character forms and such lie outside its scope In the second and third standards as well notes to the effect that specific designs of character forms lie outside its scope the note on item 1 The fourth standard also stipulates that This standard regulates graphic characters as well as their bit patterns and the use specific designs of individual characters and so forth are not within the scope of this standard JIS X 0208 1997 item 1 Unification criteria for compatibility Edit In the fourth standard unification criteria for maintaining compatibility with previous standards 過去の規格との互換性を維持するための包摂規準 kako no kikaku to no gokansei wo iji suru tame no hōsetsu kijun is defined Their application is limited to 29 code points whose glyphs vary greatly between the standards JIS C 6226 1983 on and after and JIS C 6226 1978 For those 29 code points the glyphs from JIS C 6226 1983 on and after are displayed as A and the glyphs from JIS C 6226 1978 as B On each of them both A and B glyphs may be applied However in order to claim compatibility with the standard whether the A or B form has been used for each code point must be explicitly noted Character encodings EditEncoding schemes stipulated by JIS X 0208 Edit In JIS X 0208 1997 article 7 combined with appendices 1 and 2 define a total of eight encoding schemes In the descriptions below the CL control left GL graphic left CR control right and GR graphic right regions are respectively in column line notation from 0 0 to 1 15 from 2 1 to 7 14 from 8 0 to 9 15 and from 10 1 to 15 14 For each code 2 0 is assigned the graphic character SPACE and 7 15 the control character DELETE The C0 control characters defined in JIS X 0211 and matching ISO IEC 6429 are assigned to the CL region 7 bit encoding for kanji Stipulated in the standard itself The JIS X 0208 double byte set is assigned to the GL region 8 bit encoding for kanji Stipulated in the standard itself Same as the 7 bit encoding but defined in terms of 8 bit bytes The CR region may be unused or encode the C1 control characters from JIS X 0211 The GR region is unused International Reference Version 7 bit encoding for kanji Stipulated in the standard itself The shift in control character designates the ISO IEC 646 1991 IRV International Reference Version equivalent to US ASCII to the GL region Shift out designates the JIS X 0208 double byte set to the same region Latin characters 7 bit encoding for kanji Stipulated in the standard itself As with IRV 7 bit but with ISO IEC 646 IRV replaced with ISO IEC 646 JP the Roman set of JIS X 0201 International Reference Version 8 bit encoding for kanji Stipulated in the standard itself ISO IEC 646 IRV is assigned to the GL region JIS X 0208 to the GR region This is effectively a subset of EUC JP excluding the half width katakana from JIS X 0201 and the supplemental kanji from JIS X 0212 Latin characters 8 bit encoding for kanji Stipulated in the standard itself As with IRV 8 bit but with ISO IEC 646 IRV replaced with ISO IEC 646 JP Shift coded character set Stipulated in Appendix 1 Shift Coded Representation シフト符号化表現 Shifuto Fugōka Hyōgen The authoritative definition of Shift JIS RFC 1468 coded character set Stipulated in Appendix 2 RFC 1468 Coded Representation RFC 1468符号化表現 RFC 1468 Fugōka Hyōgen Resembles ISO 2022 JP which is authoritatively defined in RFC 1468 but is defined in terms of eight bit bytes whereas ISO 2022 JP is defined in terms of seven bit bytes Among the encodings stipulated in the fourth standard only the Shift coded character set is registered by the IANA 11 However certain others are closely related to IANA registered encodings defined elsewhere EUC JP and ISO 2022 JP Escape sequences for JIS X 0202 ISO 2022 Edit JIS X 0208 may be used within ISO 2022 JIS X 0202 of which ISO 2022 JP is a subset The escape sequences to designate JIS X 0208 to each of the four ISO 2022 code sets are listed below Here ESC refers to the control character Escape 0x1B or 1 11 ISO 2022 escape sequences to select JIS C 6226 and JIS X 0208 Standard G0 G1 G2 G378 ESC 2 4 4 0 ESC 2 4 2 9 4 0 ESC 2 4 2 10 4 0 ESC 2 4 2 11 4 083 ESC 2 4 4 2 ESC 2 4 2 9 4 2 ESC 2 4 2 10 4 2 ESC 2 4 2 11 4 290 onward ESC 2 6 4 0 ESC 2 4 4 2 ESC 2 6 4 0 ESC 2 4 2 9 4 2 ESC 2 6 4 0 ESC 2 4 2 10 4 2 ESC 2 6 4 0 ESC 2 4 2 11 4 2The escape sequence starting ESC 2 4 selects a multi byte character set The escape sequence starting ESC 2 6 specifies a revision of the upcoming character set selection JIS C 6226 1978 is identified by the multibyte 94 set identifier byte 4 0 corresponding to ASCII JIS C 6226 1983 JIS X 0208 1983 is identified by the multibyte 94 set identifier byte 4 2 B JIS X 0208 1990 is also identified by the 94 set identifier byte 4 2 but can be distinguished with the revision identifier 4 0 Duplicate encodings of ASCII and JIS X 0201 Edit Further information Halfwidth and fullwidth forms and Halfwidth and Fullwidth Forms Unicode block When using the kanji set of this standard with either the ISO IEC 646 1991 IRV graphic character set ASCII or JIS X 0201 s graphic character set for Latin characters JIS Roman the treatment of the characters common to both sets becomes problematic Unless one takes special measures the characters included in both sets do not all map to each other one to one and a single character may be given more than one code point that is it may cause a duplicate encoding JIS X 0208 1997 in regards to when a character is common to both sets basically forbids the use of the code point in the kanji set which is one of two code points eliminating duplicate encodings It is judged that characters that have the same name are the same character For example both the name of the character corresponding to the bit pattern 4 1 in ASCII and the name of the character corresponding to row 3 cell 33 of the kanji set are LATIN CAPITAL LETTER A In International Reference Version 8 bit code for kanji whether by the bit pattern 4 1 or by the bit pattern corresponding to the kanji set s row 3 cell 33 10 3 12 1 the letter A i e LATIN CAPITAL LETTER A is represented The standard forbids the use of the 10 3 12 1 bit pattern in an attempt to eliminate the duplicate encoding In consideration to implementations that treat the characters of the code points in the kanji set as full width characters and those of ASCII or JIS Roman as different characters the use of the kanji set code points is permitted only for the sake of backwards compatibility For example for the purpose of backwards compatibility it is permitted to consider 10 3 12 1 in International Reference Version 8 bit code for kanji to correspond to a full width A If the kanji set is used along with ASCII or JIS Roman then even if the standard is abided by strictly the unique encoding of a character is not guaranteed For example in the International Reference Version 8 bit code for kanji it is valid to represent a hyphen with the bit pattern 2 13 for the character HYPHEN MINUS as well as with the kanji set s row 1 cell 30 bit pattern 10 1 11 14 for the character HYPHEN In addition the standard does not define which of the two to use for what and so the hyphen is not given one unique encoding The same problem affects the minus sign the quotation marks and so forth Moreover even if the kanji set is used as a separate code there is no guarantee that the unique encoding of characters is implemented In many cases however the full width IDEOGRAPHIC SPACE at row 1 cell 1 and the half width space 2 0 coexist How the two should be different is not self explanatory and is not specified in the standard Comparison of encoding schemes used in practice Edit Encoding Alternate name 7 bit A ISO 2022 State less B Accepts ASCII 0x00 7F always ASCII Superset of 8 bit JIS X 0201 Supports JIS X 0212 Bytewise self synchron izing Bitwise self synchron izing ISO 2022 JP JIS JIS X 0202 Yes Yes No C Yes Sequences can be non ASCII C No encoding possible D Possible E No NoShift JIS SJIS No No Yes Almost F Isolated bytes can be non ASCII G Yes No No NoEUC JP UJIS Unixized JIS No Yes H Yes H Usually I Yes No encoded J Usually available K No NoUnicode formats for comparison L UTF 8 No No Yes Yes Yes No encoded Available Yes Usually M UTF 16 Unicode N No No Yes No No No encoded Available Over 16 bit words only NoGB 18030 No No O Yes Yes Isolated bytes can be non ASCII No encoded Available No NoUTF 32 No No Yes No No No encoded Available Usually in practice P No i e does not require 8 bit clean transmission i e the sequence used to encode a given character is always the same no matter what the previous character s were See state computer science a b ISO 2022 JP is a stateful encoding all charsets are encoded over 0x21 7E and are switched between using ANSI escapes Hence while it is ASCII in its initial state entire sequences of non ASCII characters can be encoded with ASCII bytes JIS X 0201 katakana are available in JIS X 0202 and ISO 2022 but not included in the basic ISO 2022 JP profile although they are a common extension JIS X 0212 is available in JIS X 0202 and ISO 2022 and included in the ISO 2022 JP 1 and ISO 2022 JP 2 profiles but not in the basic ISO 2022 JP profile Single byte characters 0x21 7E in Shift JIS are properly ISO 646 JP in order to be a superset of 8 bit JIS X 0201 but are often decoded not necessarily displayed as ASCII which differs only in two places Some not all ASCII bytes can appear as second bytes but not first bytes of double byte characters in Shift JIS Hence in a sequence of two or more ASCII bytes the second byte onward are necessarily ASCII or ISO 646 JP characters a b Packed format EUC is based on ISO 2022 mechanisms with charset designations pre arranged Charset designation escapes and locking shifts are avoided whereas use of single shifts can be implemented in a non stateful manner The constraints of ISO 2022 are nonetheless followed Single byte characters 0x21 7E in EUC JP are generally considered ASCII but sometimes treated as ISO 646 JP Unlike Shift JIS EUC JP will not handle plain 8 bit JIS X 0201 input without prior conversion due to the different representation of the JIS X 0201 katakana with single shifts JIS X 0212 in EUC JP is not always implemented Besides the properties of the encodings themselves Unicode formats have further advantages stemming from the underlying character set they are not limited to JIS coded characters but can represent the entirety of UCS including the full repertoire of JIS coded characters and are hence suited to international use They are also less badly affected by colliding proprietary extensions due to their greater base repertoire and designated private use areas Most bitwise frameshifts of UTF 8 encoded text will produce invalid UTF 8 but it is possible to construct sequences of characters that remain valid UTF 8 even when frameshifted by one or more bits By Microsoft only While GB 18030 and GBK are extensions of the EUC CN form of GB T 2312 they do not follow the constraints of EUC or ISO 2022 unlike EUC JP or the original EUC CN Although in theory UTF 32 is self synchronizing over 32 bit dwords only the use of a 32 bit value to represent a 21 bit value means that in practice UTF 32 contains a continuous run of at least 11 zero bits at the high end of each character which can usually be used to align to character boundaries depending on the codepoint s involved History EditUntil five years have passed after a Japanese Industrial Standard has been established reaffirmed or revised the prior standard undergoes a process of reaffirmation revision or withdrawal Since establishment the standard has been subject to revision three times and at present the fourth standard is valid First standard Edit The first standard is JIS C 6226 1978 Code of Japanese Graphic Character Set for Information Interchange 情報交換用漢字符号系 Jōhō Kōkan yō Kanji Fugōkei established by the Japanese Minister of International Trade and Industry on 1 January 1978 It is also called 78JIS for short Entrusted by the Agency of Industrial Science and Technology a JIPDEC kanji code standardization research and study committee produced the draft The committee chairman was Moriguchi Shigeichi The code included 453 non Kanji including Hiragana Katakana the Roman Greek and Cyrillic alphabets and punctuation and 6349 Kanji 2965 level 1 Kanji and 3384 level 2 Kanji for a total of 6802 characters 12 It did not yet include box drawing characters The standard itself was set in Shaken Co Ltd s Ishii Mincho typeface Second standard Edit The second standard JIS C 6226 1983 Code of Japanese Graphic Character Set for Information Interchange 情報交換用漢字符号系 Jōhō Kōkan yō Kanji Fugōkei revised the first standard on 1 September 1983 It is also called 83JIS Entrusted by the AIST a JIPDEC kanji code related JIS committee produced the draft The committee chairman was Motooka Tōru The draft of the second standard was based on the consideration of factors such as the promulgation of the jōyō kanji the enforcement of the jinmeiyō kanji and the standardization of Japanese language Teletex by the Ministry of Posts and Telecommunications also the next modification was performed to keep pace with JIS C 6234 1983 24 pixel matrix printer character forms presently JIS X 9052 Addition of special characters 39 characters were added to the special characters Among these 39 per JICST recommendations and from such standards as JIS Z 8201 1981 mathematical symbols and JIS Z 8202 1982 quantity unit and chemical symbols things that could not be represented by composition were chosen Newly added box drawing characters 32 box drawing characters were added Swapping of itaiji code points Code points for 22 variant pairs of Kanji were swapped such that the variant in level 2 was moved to level 1 and vice versa 12 13 For example level 1 s row 36 cell 59 in the first standard 壺 was moved to level 2 s row 52 cell 68 the point originally at row 52 cell 68 壷 was in turn moved to row 36 cell 59 Additions to the level 2 kanji Three characters from level 1 and one character from level 2 were given new code points at previously unassigned code points in row 84 as level 2 kanji Itaiji for each of those code points were newly assigned to their original locations 14 For example row 84 cell 1 in the second standard 堯 was moved there to accommodate a different form not included in the first standard at row 22 cell 38 as a level 1 kanji 尭 Modification of character forms The character forms of approximately 300 kanji were amended 15 Among the changes in those 300 or so kanji character forms many level 1 glyphs that were in the style of the Kangxi Dictionary were changed into variants and especially more simplified forms e g ryakuji and extended shinjitai For example a couple of code points that are often the subject of criticism due to being greatly changed are row 18 cell 10 78JIS 鷗 83JIS 鴎 and row 38 cell 34 78JIS 瀆 83JIS 涜 There were many smaller changes away from the Kangxi style variants for example row 25 cell 84 鵠 lost part of a stroke Also where some glyphs for level 1 kanji were not Kangxi style forms there were some changed into their Kangxi style forms for example row 80 cell 49 靠 gained part of a stroke i e the same part of the stroke that 25 84 lost In order to elucidate the original intent of the first standard these ended up falling into parameters for unification criteria in the fourth standard The difference in form for the examples noted above 鵠 and 靠 falls under the parameters for unification criterion 42 concerning the component 告 s The bulk of the changes to character forms are differences between level 1 and level 2 kanji Specifically simplification was done more often for level 1 kanji than for level 2 kanji simplifications applied to level 1 kanji e g 潑 to 溌 and 醱 to 醗 were not generally applied to kanji in level 2 撥 stayed as is The aforementioned 25 84 鵠 and 80 49 靠 were given different treatment likewise as the former is in level 1 and the latter is in level 2 Even so there were some changes regardless of the level for instance characters containing the door 戸 and winter 冬 components were changed with no different treatment between level 1 and level 2 kanji However for 29 code points such as the problematic 18 10 and 38 34 mentioned above the forms inherited by the fourth standard contradicts the original intent of the first For these there are special unification criteria to maintain compatibility with the previous standards at these code points When the new X category for Japanese Industrial Standards for information related fields was introduced the second standard was re termed JIS X 0208 1983 12 on 1 March 1987 Third standard Edit The third standard JIS X 0208 1990 Code of Japanese Graphic Character Set for Information Interchange 情報交換用漢字符号 Jōhō Kōkan yō Kanji Fugō revised the second standard on 1 September 1990 It is also called 90JIS for short Entrusted by the AIST a committee at the Japanese Standards Association for the revision of JIS X 0208 created the draft The committee chairman was Tajima Kazuo 225 kanji glyphs were changed and two characters were added to level 2 84 05 凜 and 84 06 熙 This was a disunification of itaiji for two characters already included 49 59 凛 and 63 70 煕 Some of the changes and the two additions corresponded to the 118 jinmeiyō kanji added in March 1990 12 The standard itself was set in Heisei Mincho Fourth standard Edit The fourth standard JIS X 0208 1997 7 bit and 8 bit double byte coded KANJI sets for information interchange 7ビット及び8ビットの2バイト情報交換用符号化漢字集合 Nana Bitto Oyobi Hachi Bitto no Ni Baito Jōhō Kōkan yō Fugōka Kanji Shugō revised the third standard on 20 January 1997 It is also called 97JIS for short Entrusted by the AIST a JSA committee for research and study of coded character sets produced the draft The committee chairman was Shibano Kōji The basic policies of this revision were to perform no changes the character set to clarify ambiguous provisions and to make the standard relatively easier to use Addition removal and code point rearrangement were not done and without exception the example glyphs were also left unchanged However the stipulations of the standard were completely re written and or supplemented Whereas the third standard was 65 pages long without the explanations the fourth standard was 374 pages without the explanations The main points of the revision are Definition of encoding methods Until the third standard only the encoding method based on JIS X 0202 code extension was defined This is something unusual as far as coded character sets go In the fourth standard encoding methods that do not use escape sequences for the purpose of code extension were defined Definition of the general prohibition of the use of unassigned code points and methods of usage for unassigned code points The third standard in an explanation that was not part of the standard described things as if there were places where for some unassigned code points it was acceptable to assign gaiji In the fourth standard it was clarified that use of unassigned code points is generally prohibited Also the conditions for the usage of unassigned code points were specified General elimination of duplicate encodings Each character was given a character name that maps to those of other standards Also encoding methods to use them together with the ISO IEC 646 s International Reference Version or JIS X 0201 were specified When JIS X 0208 is used together with either among two assigned code points for characters with the same name only one is permitted thus duplicate encodings were generally eliminated Investigation into sources of kanji Characters included in the standard so far that are found in neither the Kangxi Dictionary nor the Dai Kanwa Jiten were identified Accordingly exactly with what purpose for inclusion and from which sources these kanji came during compilation of the first standard was investigated Definition of kanji unification criteria Based on things such as the materials for the drafting of the first standard an attempt was made to restore the intent of the first standard for the scope of the glyphs each code point represents Moreover the criteria for unifying kanji glyphs were clearly defined Inclusion of de facto standards By the time of the fourth standard the encoding methods Shift JIS and ISO 2022 JP had become de facto standards for personal computing and e mail respectively These encoding methods were included as Shift Coded Representation and RFC 1468 Coded Representation described above Successors Edit This section s factual accuracy may be compromised due to out of date information Please help update this article to reflect recent events or newly available information December 2021 JIS X 0213 extended kanji was designed with the goal being to offer a sufficient character set for the purposes of encoding the modern Japanese language that JIS X 0208 intended to be from the start 16 it defines a character set that expands upon the kanji set of JIS X 0208 The drafters of JIS X 0213 recommend migration from JIS X 0208 to JIS X 0213 among the advantages being JIS X 0213 s compatibility with the Hyōgai Kanji Glyph List and with newer jinmeiyō kanji Contrary to the expectations of the drafters adoption of JIS X 0213 has been anything but fast since its enactment in the year 2000 The drafting committee of JIS X 0213 2004 wrote in the year 2004 The status where what the majority of information systems can use in common is JIS X 0208 only still continues JIS X 0213 2000 Appendix 1 2004 section 2 9 7 For Microsoft Windows the predominant operating system and hence supplying the predominant desktop environment in the personal computing sector the JIS X 0213 repertoire has been included since Windows Vista released in November 2006 Mac OS X has been compatible with JIS X 0213 since version 10 1 released in 2001 Many Unix likes such as Linux can optionally support JIS X 0213 if desired Therefore it is thought that with time JIS X 0213 support on personal computers will not be an impediment to its eventual adoption Among the drafters of JIS X 0213 there are those who expect to see a mix of JIS X 0208 and JIS X 0213 before any adoption of JIS X 0213 Satō 2004 However JIS X 0208 continues to be used for the present and many predict it to endure as a standard There are barriers that need to be overcome if JIS X 0213 is to supplant JIS X 0208 in common usage The character repertoires utilized in Japanese mobile phones at the present time when are based on JIS X 0208 There are no officially announced plans whatsoever to migrate these to JIS X 0213 compatibility As mobile phones are now a pervasive aspect of Japanese textual communication see Japanese mobile phone culture being a widespread commonly accessed medium for sending e mail and accessing the World Wide Web a lack of adoption for mobile phones deters usage elsewhere JIS X 0213 is not strictly upward compatible with JIS X 0208 in terms of unification criteria see below For large scale archives e g bibliographic databases and Aozora Bunko that use JIS X 0208 and follow its unification criteria strictly it is thought that it would be extremely difficult work to both convert all the data to JIS X 0213 and preserve the same standard of textual integrity In practice many systems define and use unassigned code points in JIS X 0208 For example Windows assigns IBM and NEC extended characters and user defined character areas see Windows 932 and mobile phones assign emoji in some such places The code points of these gaiji conflict with the code points that JIS X 0213 codes use so there would be some difficulty in migrating these systems from JIS X 0208 to JIS X 0213 There are also plans to migrate to UCS Unicode and use the JIS X 0213 repertoire from there but until a system administrator is able to judge that the implementations of UCS Unicode surrogate pairs and character compositions are sufficiently stable he or she is likely to hesitate to use the repertoire of JIS X 0213 that requires those implementations The improvements provided by JIS X 0213 are mostly in the realm of characters that are not used as often as the ones already present in JIS X 0208 Because there are nearly twice as many glyphs that need to be implemented for less usage of those extra glyphs it can be a low return on investment in many cases especially where resources are constrained Implementations EditBecause JIS X 0208 JIS C 6226 is primarily a character set and not a strictly defined character encoding several companies have implemented their own encodings of the character set Apple Computer Inc MacJapanese Shift JIS based Fujitsu JEF kanji code EBCDIC based Hitachi Ltd KEIS EBCDIC based IBM various including IBM 932 and IBM 942 both Shift JIS based Microsoft Windows 932 Shift JIS based NEC JIPSSeveral of these incorporate vendor specific character assignments in place of unallocated regions of the standard These include Windows 932 and MacJapanese as well as NEC s PC98 character encoding While IBM 932 and IBM 942 also include vendor assignments they include them outside of the region used for JIS X 0208 Relation to other standards EditISO IEC 646 IRV and ASCII Edit As noted above the kanji set is not upwardly compatible with the ISO IEC 646 1991 IRV ASCII graphic character set The kanji set and the IRV graphic character set can be used together as specified in JIS X 0208 IRV 7 bit code for kanji and IRV 8 bit code for kanji They can be used together in EUC JP as well JIS X 0201 Edit The kanji set lacks three characters included in JIS X 0201 s graphic character set for Latin characters 2 2 QUOTATION MARK 2 7 APOSTROPHE and 2 13 HYPHEN MINUS The kanji set contains all character included in JIS X 0201 s graphic character set for katakana The kanji set and the graphic character set for Latin characters can be used together as specified in JIS X 0208 Latin characters 7 bit code for kanji and the Latin characters 8 bit code for kanji The kanji set graphic character set for Latin characters and JIS X 0201 s graphic character set for katakana can be used together as specified in JIS X 0208 the shift coded character set i e Shift JIS The kanji set and graphic character set for katakana can be used together in EUC JP JIS X 0212 Edit JIS X 0212 supplementary kanji defines additional characters with code points for the purposes of information processing that requires characters not found in JIS X 0208 Rather than allocating characters within the main JIS X 0208 kanji set it defines a second 94 by 94 kanji set containing supplementary characters JIS X 0212 can be used with JIS X 0208 in EUC JP Also JIS X 0208 and JIS X 0212 are both source standards for UCS Unicode s Han unification meaning that kanji from both sets can be included in one Unicode format document Among the code points that the second version of JIS X 0208 changed 28 code points in JIS X 0212 reflect the character forms from before the changes 17 Also JIS X 0212 reassigns the closure mark that JIS X 0208 had assigned as a non kanji 〆 at row 1 cell 26 as a kanji 乄 at row 16 cell 17 JIS X 0212 has no characters in common with JIS X 0208 other than these Hence it is not suited for general use on its own However in the fourth version of JIS X 0208 the connection to JIS X 0212 was not defined at all It is believed that this is because the drafting committee of the fourth JIS X 0208 standard had a critical opinion of the selection and identification methods of JIS X 0212 18 The character meanings and selection rationales were not properly documented making it difficult to identify whether desired kanji corresponded to those in its repertoire 19 The text of the fourth standard as well as pointing out the problematic points of the character selection of JIS X 0212 states that it is thought that not only is character selection impossible it is also impossible to use together the connection to JIS X 0212 is not defined at all section 3 3 1 JIS X 0213 Edit Euler diagram comparing repertoires of JIS X 0208 JIS X 0212 JIS X 0213 Windows 31J the Microsoft standard repertoire and Unicode JIS X 0213 extension kanji defines a kanji set that expands upon the kanji set of JIS X 0208 According to this standard it is designed with the goal being to offer a sufficient character set for the purposes of encoding the modern Japanese language that JIS X 0208 intended to be from the start 16 The kanji set of JIS X 0213 incorporates all characters that can be represented in the kanji set of JIS X 0208 with many additions In total JIS X 0213 defines 1183 non kanji and 10 050 kanji for a total of 11 233 characters within two 94 by 94 planes 面 men The first plane non kanji and level 1 3 kanji is based on JIS X 0208 whereas the second plane level 4 kanji is designed to fit within the unallocated rows of JIS X 0212 allowing use in EUC JP 20 JIS X 0213 also defines Shift JISx0213 a variant of Shift JIS capable of encoding the entirety of JIS X 0213 For most intents and purposes JIS X 0213 plane 1 is a superset of JIS X 0208 However different unification criteria are applied to some code points in JIS X 0213 compared to JIS X 0208 Consequently some pairs of kanji glyphs that were represented by one JIS X 0208 code point due to being unified are given separate code points in JIS X 0213 For example the glyph at row 33 cell 46 of JIS X 0208 僧 described above unifies a few variants due to its right hand component In JIS X 0213 two forms the ones containing the component 丷 are unified on plane 1 row 33 cell 46 and the other containing the component 八 is located at plane 1 row 14 cell 41 Therefore whether JIS X 0208 row 33 cell 46 should be mapped to JIS X 0213 plane 1 row 33 cell 46 or plane 1 row 14 cell 41 cannot be determined automatically t This limits the extent to which JIS X 0213 can be considered upwardly compatible with JIS X 0208 as admitted by the JIS X 0213 drafting committee 21 However for the most part row m cell n in JIS X 0208 corresponds to plane 1 row m cell n in JIS X 0213 therefore not much confusion arises in practice This is because most typefaces have come to use the glyphs exemplified in JIS X 0208 and most users are not consciously aware of the unification criteria ISO IEC 10646 and Unicode Edit The kanji set of JIS X 0208 is among the original source standards for the Han unification in ISO IEC 10646 UCS and Unicode Every kanji in JIS X 0208 corresponds to its own code point in UCS Unicode s Basic Multilingual Plane BMP The non kanji in JIS X 0208 also correspond to their own code points in the BMP However for some special characters some systems implement a different correspondences from those of UCS Unicode s which are based on the character names given JIS X 0208 1997 Footnotes EditExplanatory Edit a b c d Withdrawn JIS and Apple U 2014 Unicode a Microsoft and WHATWG U 2015 Microsoft and WHATWG U FF5E Unicode a JIS and Apple U 301C Microsoft and WHATWG U 2225 Unicode a JIS and Apple U 2016 Microsoft U FF0D Unicode a JIS and Apple U 2212 WHATWG U FF0D on decoding exceptionally both on encoding a b c d Added in JIS X 0213 Absent in original version of extension which predates the Heisei era Code position selected by either NEC or Microsoft 5 Not in Macintosh PostScript a b c d e f g h i Duplicated by additions made to row 2 in 1983 Not encoded here but left unallocated in JIS X 0213 5 but duplicate encoded here by Microsoft and WHATWG As for the Macintosh PostScript encoding a Private Use U F87F is appended to the form decoded with the macOS library functions to allow round tripping As shown in the code tables registered at the International Register of Coded Character Sets To Be Used With Escape Sequences prior to the fourth standard 1997 the ku 区 and ten 点 were called section and position respectively in English As to the background of the change in the English in the JIS X 0221 1995 UCS standard that translated ISO IEC 10646 1 1993 group plane row and cell can be translated into gun 群 men 面 ku 区 and ten 点 However the row and cell of JIS X 0208 and the row and cell of the UCS are different ideas Character names are given in Roman letters and are used internationally so they can be considered an international convention somewhat like the scientific names of living organisms In regard to this analogy the Japanese common names for the characters would be like using common names for organisms For a fully featured kana order search or sort word readings repetition marks and so forth must be taken into account The sorting of Japanese character strings is prescribed in JIS X 4061 Collation of Japanese character strings According to Yasuoka 2001a it seems there were some accidental oversights He notes for example that the ba 旛 58 57 of Inba and the shi 泗 61 89 of Shisui Kumamoto are not part of level 1 List 丼 傲 刹 哺 喩 嗅 嘲 毀 彙 恣 惧 慄 憬 拉 摯 曖 楷 鬱 璧 瘍 箋 籠 緻 羞 訃 諧 貪 踪 辣 錮 The jōyō kanji 𠮟 is included only in its official variant form 叱 List 乘 亞 佛 侑 來 俐 傳 僞 價 儉 兒 凉 凛 凰 剩 劍 勁 勳 卷 單 嚴 圈 國 圓 團 壞 壘 壯 壽 奎 奧 奬 孃 實 寢 將 專 峽 崚 巖 巫 已 帶 廣 廳 彈 彌 彗 從 徠 恆 惡 惠 惺 愼 應 懷 戰 戲 拔 拜 拂 搜 搖 攝 收 敍 昊 昴 晏 晄 晝 晨 晟 暉 曉 檜 栞 條 梛 椰 榮 樂 樣 橙 檢 櫂 櫻 盜 毬 氣 洸 洵 淨 渾 滉 漱 滯 澁 澪 濕 煌 燒 燎 燿 爭 爲 狹 默 獸 珈 珀 琥 瑶 疊 皓 盡 眞 眸 碎 祕 祿 禪 禮 稟 稻 穗 穰 穹 笙 粹 絆 綺 綸 縣 縱 纖 羚 翔 飜 聽 脩 臟 與 苺 茉 莊 莉 菫 萠 萬 蕾 藏 藝 藥 衞 裝 覽 詢 諄 謠 讓 賣 赳 轉 迪 逞 醉 釀 釉 鎭 鑄 陷 險 雜 靜 頌 顯 颯 騷 驍 驗 髮 鷄 麒 黎 齊 堯 槇 遙 凜 熙 List 焰 鷗 俠 繫 繡 渚 蔣 醬 蟬 琢 簞 摑 顚 禱 萊 蠟 增 德 橫 瀨 猪 神 祥 福 綠 緖 薰 諸 賴 郞 都 黑 逸 謁 緣 黃 溫 禍 悔 海 渴 漢 器 祈 虛 響 勤 謹 揭 擊 穀 祉 視 煮 社 者 臭 祝 暑 署 涉 狀 節 祖 僧 層 巢 憎 贈 卽 嘆 著 徵 禎 突 難 梅 繁 晚 卑 碑 賓 敏 侮 勉 步 墨 每 祐 欄 虜 淚 類 曆 歷 練 鍊 錄 俱 瘦 吞 寬 廊 朗 懲 For row 19 cells 30 and 31 the order is mixed up for their representative readings Consequently where the correct order should be kaeru 蛙 frog followed by kaori 馨 aroma their positions are transposed so that kaori precedes kaeru In addition the primarily used variant 剣 is at row 23 cell 85 on level 1 and one other variant 釼 can be found grouped as having the gold radical at row 78 cell 63 on level 2 The question of which glyphs within the unification criteria are to be used is left to the type designer Depending on that and the end user s circumstances it is possible that neither both one or the other of these two will follow their Kangxi style form This is the same uncertainty as to whether the HYPHEN MINUS in ISO IEC 646 should be mapped to HYPHEN or MINUS SIGN in JIS X 0208 Reference footnotes Edit Why Japan didn t create the iPod Gatunka 5 May 2008 JIS X 0208 was not one of the standards included in the list of applicable target systems for display of the new JIS mark announced by the Ministry of Economy Trade and Industry on 17 January 2007 a b c Steele Shawn 15 April 1998 CP932 TXT cp932 to Unicode table Microsoft codes in Shift JIS format SJIS 0x815C 1 29 JIS 0x213D SJIS 0x817C 1 61 JIS 0x215D a b Map external version from Mac OS Japanese encoding to Unicode 2 1 and later Apple codes in Shift JIS format SJIS 0x815C 1 29 JIS 0x213D SJIS 0x817C 1 61 JIS 0x215D a b c d Lunde Ken 21 March 2019 A Brief History of Japan s Era Name Ligatures CJK Type Blog Adobe Inc a b c Japanese Industrial Standard Committee ISO IR 233 Japanese Graphic Character Set for Information Interchange Plane 1 Update of ISO IR 228 PDF ITSCJ IPSJ Unicode Inc 14 October 2011 JIS X 0208 1990 to Unicode van Kesteren Anne Index jis0208 Encoding Standard WHATWG a b Jungshik Shin 14 October 2011 KSX1001 TXT KS X 1001 to Unicode table Unicode Inc JIS C 6225 1979 control character codes for the purpose of the Japanese graphic character set for information interchange provided control characters for the beginning and end of composition JIS C 6225 was re termed JIS X 0207 in 1987 and was withdrawn in 1997 In the IANA character sets Shift JIS is defined by referring to JIS X 0208 1997 Appendix 1 a b c d 15 History of JIS X 0208 PDF IBM Japanese Graphic Character Set for Extended UNIX Code EUC IBM p 371 archived PDF from the original on 8 December 2017 retrieved 8 December 2017 Lunde Ken Appendix Q 78 vs 83 3 CJKV Information Processing supplementary material O Reilly Note inclusion of kuten codes with hyphen omitted Lunde Ken Appendix Q 78 vs 83 2 CJKV Information Processing supplementary material O Reilly Note inclusion of kuten codes with hyphen omitted According to Nomura 1984 the number of character forms changed including moves between code points is 294 According to Shibano 1997a and the text of the fourth standard the number is of character forms changed is 300 a b Original Japanese JIS X 0208が当初符号化を意図していた現代日本語を符号化するために十分な文字集合を提供することを目的として設計された Lunde Ken Appendix Q TJ2 CJKV Information Processing supplementary material O Reilly Note inclusion of kuten codes with hyphen omitted For example Shibano Kōji 1997a who served as the chairman of the drafting committee for the fourth standard stated these about the selection method It is based on a superficial understanding of JIS X 0208 s character set selection it is a mistaken understanding original Japanese JIS X 0208の文字集合選定の表層的理解に基づくものであり 間違った理解である and There is a big problem in investigating all of a character set that exceeds 10000 characters original Japanese 1万字を越える水準の文字集合の検討としては 大きな問題がある Marukawa Kazushi JIS Character Sets JIS X 0212 1990 Archived from the original on 22 May 2005 Chang Hyeshik 31 October 2021 Readme for CJKCodecs cPython Python Software Foundation JIS X 0213 2000 section 5 3 2 JIS X 0213 2000 Appendix 1 2004 section 3 2 2See also EditJIS coded character sets JIS X 0201 7 bit and 8 bit coded character sets for information interchange JIS X 0202 Information technology Character code structure and extension techniques ISO IEC 2022 JIS X 0208 7 bit and 8 bit double byte coded KANJI sets for information interchange JIS X 0211 Control functions for coded character sets ISO IEC 6429 JIS X 0212 Code of the supplementary Japanese graphic character set for information interchange JIS X 0213 7 bit and 8 bit double byte coded extended KANJI sets for information interchange JIS X 0221 Universal Multiple Octet Coded Character Set UCS ISO IEC 10646 Extended shinjitai Help JapaneseReferences EditFor the purposes of citation these Japanese names are presented as if they were in Western order where Romanized and retain Eastern order where not Nishimura Hirohiko 西村 恕彦 1978 The Kanji JIS 漢字のJIS Standardization Journal 標準化ジャーナル 171 3 8 Nomura Masaaki 野村 雅昭 1984 Revision of JIS C 6226 Kanji codes for information interchange JIS C 6226 情報交換用漢字符号系の改正 Standardization Journal 標準化ジャーナル 14 3 4 9 Ogata Katsuhiro 小形 克宏 2006a permanent dead link Things that were not unified in 97JIS among the example glyphs changed in JIS C 6226 1983 83JIS JIS C 6226 1983 83JIS で例示字体を変更したうち 97JISで包摂とされなかったもの permanent dead link accessed 29 January 2007 Ogata Katsuhiro 小形 克宏 2006b permanent dead link Things that fell within the scope of unification among the example glyphs changed in JIS C 6226 1983 83JIS JIS C 6226 1983 83JIS 例示字体変更のうち 包摂の範囲内だったもの permanent dead link accessed 29 January 2007 Satō Takayuki 佐藤 敬幸 2004 Concerning the revision of JIS X 0213 7 bit and 8 bit double byte coded extended Kanji sets for information interchange JIS X 0213 7ビット及び8ビットの2バイト情報交換用符号化拡張漢字集合 の改正について Standardization Journal 標準化ジャーナル 34 4 8 12 Shibano Kōji 芝野 耕司 1997a Concerning the revision of JIS X 0208 7 bit and 8 bit double byte coded Kanji sets for information interchange JIS X0208 7ビット及び8ビットの2バイト情報交換用符号化漢字集合 の改正について Standardization Journal 標準化ジャーナル 27 3 8 12 Shibano Kōji 芝野 耕司 1997b Plan for the extension of the JIS kanji JIS漢字の拡張計画 Standardization Journal 標準化ジャーナル 27 7 5 11 Shibano Kōji 芝野 耕司 2000 Establishment of JIS X 0213 7 bit and 8 bit double byte coded extended Kanji sets for information interchange JIS X 0213 7ビット及び8ビットの2バイト情報交換用符号化拡張漢字集合 の制定 Standardization Journal 標準化ジャーナル 30 3 3 7 Shibano Kōji 芝野 耕司 2001 Concerning JIS kanji 漢字について Standardization and Quality Control 標準化と品質管理 54 8 44 50 Shibano Kōji 芝野 耕司 editor 2002 JIS Kanji Dictionary enlarged and revised edition 増補改訂 JIS漢字字典 Tokyo Japanese Standards Association ISBN 4 542 20129 5 Shibano Kōji 芝野 耕司 2002 The development of kanji and Japanese language processing technologies the standardization of kanji codes 漢字 日本語処理技術の発展 漢字コードの標準化 IPSJ Magazine 情報処理 43 12 1362 1367 Tajima Kazuo 田嶋 一夫 1979 Problems concerning the use of the JIS kanji listing design and handling of kanji in kanji processing systems JIS漢字表の利用上の問題 漢字処理システムにおける漢字のデザインと管理 Journal of Information Processing Society of Japan 情報管理 21 10 753 761 Uchida Tomio 内田 富雄 1990 Establishment of JIS X 0212 Kanji Codes for Information Interchange Supplemental Kanji JIS X 0212 情報交換用漢字符号 補助漢字 の制定 Standardization Journal 標準化ジャーナル 20 11 6 11 Yasuoka Kōichi 安岡 孝一 2001a Situation of the Newest Character Codes in Japan former part 日本における最新文字コード事情 前編 Systems Control and Information システム 制御 情報 45 9 528 535 Yasuoka Kōichi 安岡 孝一 2001b Situation of the Newest Character Codes in Japan latter part 日本における最新文字コード事情 後編 Systems Control and Information システム 制御 情報 45 12 687 694 Yasuoka Kōichi 安岡 孝一 2006 Differences between the JIS kanji plan 1976 and JIS C 6226 1978 JIS漢字案 1976 とJIS C 6226 1978の異同 at the 17th Computer Usage for Oriental Studies 東洋学へのコンピュータ利用 research seminar 3 51 Yasuoka Kōichi 安岡 孝一 amp Motoko Yasuoka 安岡 素子 2006 The History of Character Codes Europe America and Japan 文字符号の歴史 欧米と日本編 Tokyo Kyōritsu Shuppan ISBN 4 32012102 3 External links Edit Look up Japanese kanji by JIS X 0208 kuten code in Wiktionary the free dictionary The International Register that the IPSJ ITSCJ supervises Japanese Character Set JIS C 6226 1978 Japanese Character Set JIS C 6226 1983 Update Registration 87 Japanese Graphic Character Set for Information Interchange in Japanese Japanese Industrial Standards Committee database search the latest standard may be read here in Japanese Japanese Standards Association database search a copy of the latest standard may be purchased here in Japanese Unification related provisions in the JIS X 0208 and 0213 standards in Japanese Cyber Librarian JIS kanji listing Retrieved from https en wikipedia org w index php title JIS X 0208 amp oldid 1154511497, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.