fbpx
Wikipedia

Western Latin character sets (computing)

Several 8-bit character sets (encodings) were designed for binary representation of common Western European languages (Italian, Spanish, Portuguese, French, German, Dutch, English, Danish, Swedish, Norwegian, and Icelandic), which use the Latin alphabet, a few additional letters and ones with precomposed diacritics, some punctuation, and various symbols (including some Greek letters). These character sets also happen to support many other languages such as Malay, Swahili, and Classical Latin.

This material is technically obsolete, having been functionally replaced by Unicode. However it continues to have historical interest.

Summary edit

The ISO-8859 series of 8-bit character sets encodes all Latin character sets used in Europe, albeit that the same code points have multiple uses that caused some difficulty (including mojibake, or garbled characters, and communication issues). The arrival of Unicode, with a unique code point for every glyph, resolved these issues.

History edit

The earlier seven-bit U.S. American Standard Code for Information Interchange ('ASCII') encoding has characters sufficient to properly represent only a few languages such as English, Latin, Malay and Swahili. It is missing some letters and letter-diacritic combinations used in other Latin-alphabet languages. However, since there was no other choice on most US-supplied computer platforms, use of ASCII was unavoidable except where there was a strong national computing industry. There was the ISO 646 group of encodings which replaced some of the symbols in ASCII with local characters, but space was very limited, and some of the symbols replaced were quite common in things like programming languages.

Most computers internally used eight-bit bytes but communication (seen as inherently unreliable) used seven data bits plus one parity bit. In time, it became common to use all eight bits for data, creating space for another 128 characters. In the early days most of these were system specific, but gradually the ISO/IEC 8859 standards emerged to provide some cross-platform similarity to enable information interchange.

Towards the end of the 20th century, as storage and memory costs fell, the issues associated with multiple meanings of a given eight-bit code (there are seven ISO-Latin code sets alone) have ceased to be justified. All major operating systems have moved to Unicode as their main internal representation. However, as Windows did not support the UTF-8 method of encoding Unicode (preferring UTF-16), many applications continued to be restricted to these legacy character sets.

The euro sign edit

The introduction of the euro and its associated euro sign () introduced significant pressure on computer systems developers to support this new symbol, and most 8-bit character sets had to be adapted in some way.

  • Apple with MacRoman and Sun Microsystems with Solaris OS simply replaced the generic currency sign (¤). This caused difficulty in some places because organisations had found other uses for its code point, such as the company logo.
  • ISO introduced a further variant of ISO 8859, ISO 8859-15, which replaced the generic currency sign with the euro sign as well as making some other replacements of symbols with letters with diacritics. ISO 8859-15 never received widespread adoption.
  • With Windows-1252, Microsoft placed the euro sign in a gap (position 80hex) in the existing C1 control codes, a decision that other vendors considered counter-architectural.

Whilst these decisions had limited effect for documents that were only used within a single computer (or at least within a single vendor's "digital ecosystem"), it meant that documents containing a euro sign would fail to render as expected when interchanged between ecosystems.

All of these issues have been resolved as operating systems have been upgraded to support Unicode as standard, which encodes the euro sign at U+20AC (decimal 8364).

Comparison table edit

Code points U+0000 to U+007F are not shown in this table currently, as they are directly mapped in all character sets listed here. The ASCII coding standard defines the original specification for the mapping of the first 0-127 characters.

The table is arranged by Unicode code point. Character sets are referred to here by their IANA names in upper case.

Character Code point ISO-8859-1 ISO-8859-15 WINDOWS-1252 IBM437 IBM850 MACINTOSH
NBSP U+00A0 A0 A0 A0 FF FF CA
¡ U+00A1 A1 A1 A1 AD AD C1
¢ U+00A2 A2 A2 A2 9B BD A2
£ U+00A3 A3 A3 A3 9C 9C A3
¤ U+00A4 A4   A4   CF  
¥ U+00A5 A5 A5 A5 9D BE B4
¦ U+00A6 A6   A6   DD  
§ U+00A7 A7 A7 A7   F5 A4
¨ U+00A8 A8   A8   F9 AC
© U+00A9 A9 A9 A9   B8 A9
ª U+00AA AA AA AA A6 A6 BB
« U+00AB AB AB AB AE AE C7
¬ U+00AC AC AC AC AA AA C2
SHY U+00AD AD AD AD   F0  
® U+00AE AE AE AE   A9 A8
¯ U+00AF AF AF AF   EE F8
Character Code point ISO-8859-1 ISO-8859-15 WINDOWS-1252 IBM437 IBM850 MACINTOSH
° U+00B0 B0 B0 B0 F8 F8 A1
± U+00B1 B1 B1 B1 F1 F1 B1
² U+00B2 B2 B2 B2 FD FD  
³ U+00B3 B3 B3 B3   FC  
´ U+00B4 B4   B4   EF AB
µ U+00B5 B5 B5 B5 E6 E6 B5
U+00B6 B6 B6 B6   F4 A6
· U+00B7 B7 B7 B7 FA FA E1
¸ U+00B8 B8   B8   F7 FC
¹ U+00B9 B9 B9 B9   FB  
º U+00BA BA BA BA A7 A7 BC
» U+00BB BB BB BB AF AF C8
¼ U+00BC BC   BC AC AC  
½ U+00BD BD   BD AB AB  
¾ U+00BE BE   BE   F3  
¿ U+00BF BF BF BF A8 A8 C0
Character Code point ISO-8859-1 ISO-8859-15 WINDOWS-1252 IBM437 IBM850 MACINTOSH
À U+00C0 C0 C0 C0   B7 CB
Á U+00C1 C1 C1 C1   B5 E7
 U+00C2 C2 C2 C2   B6 E5
à U+00C3 C3 C3 C3   C7 CC
Ä U+00C4 C4 C4 C4 8E 8E 80
Å U+00C5 C5 C5 C5 8F 8F 81
Æ U+00C6 C6 C6 C6 92 92 AE
Ç U+00C7 C7 C7 C7 80 80 82
È U+00C8 C8 C8 C8   D4 E9
É U+00C9 C9 C9 C9 90 90 83
Ê U+00CA CA CA CA   D2 E6
Ë U+00CB CB CB CB   D3 E8
Ì U+00CC CC CC CC   DE ED
Í U+00CD CD CD CD   D6 EA
Î U+00CE CE CE CE   D7 EB
Ï U+00CF CF CF CF   D8 EC
Character Code point ISO-8859-1 ISO-8859-15 WINDOWS-1252 IBM437 IBM850 MACINTOSH
Ð U+00D0 D0 D0 D0   D1  
Ñ U+00D1 D1 D1 D1 A5 A5 84
Ò U+00D2 D2 D2 D2   E3 F1
Ó U+00D3 D3 D3 D3   E0 EE
Ô U+00D4 D4 D4 D4   E2 EF
Õ U+00D5 D5 D5 D5   E5 CD
Ö U+00D6 D6 D6 D6 99 99 85
× U+00D7 D7 D7 D7   9E  
Ø U+00D8 D8 D8 D8   9D AF
Ù U+00D9 D9 D9 D9   EB F4
Ú U+00DA DA DA DA   E9 F2
Û U+00DB DB DB DB   EA F3
Ü U+00DC DC DC DC 9A 9A 86
Ý U+00DD DD DD DD   ED  
Þ U+00DE DE DE DE   E8  
ß U+00DF DF DF DF E1 E1 A7
Character Code point ISO-8859-1 ISO-8859-15 WINDOWS-1252 IBM437 IBM850 MACINTOSH
à U+00E0 E0 E0 E0 85 85 88
á U+00E1 E1 E1 E1 A0 A0 87
â U+00E2 E2 E2 E2 83 83 89
ã U+00E3 E3 E3 E3   C6 8B
ä U+00E4 E4 E4 E4 84 84 8A
å U+00E5 E5 E5 E5 86 86 8C
æ U+00E6 E6 E6 E6 91 91 BE
ç U+00E7 E7 E7 E7 87 87 8D
è U+00E8 E8 E8 E8 8A 8A 8F
é U+00E9 E9 E9 E9 82 82 8E
ê U+00EA EA EA EA 88 88 90
ë U+00EB EB EB EB 89 89 91
ì U+00EC EC EC EC 8D 8D 93
í U+00ED ED ED ED A1 A1 92
î U+00EE EE EE EE 8C 8C 94
ï U+00EF EF EF EF 8B 8B 95
Character Code point ISO-8859-1 ISO-8859-15 WINDOWS-1252 IBM437 IBM850 MACINTOSH
ð U+00F0 F0 F0 F0   D0  
ñ U+00F1 F1 F1 F1 A4 A4 96
ò U+00F2 F2 F2 F2 95 95 98
ó U+00F3 F3 F3 F3 A2 A2 97
ô U+00F4 F4 F4 F4 93 93 99
õ U+00F5 F5 F5 F5   E4 9B
ö U+00F6 F6 F6 F6 94 94 9A
÷ U+00F7 F7 F7 F7 F6 F6 D6
ø U+00F8 F8 F8 F8   9B BF
ù U+00F9 F9 F9 F9 97 97 9D
ú U+00FA FA FA FA A3 A3 9C
û U+00FB FB FB FB 96 96 9E
ü U+00FC FC FC FC 81 81 9F
ý U+00FD FD FD FD   EC  
þ U+00FE FE FE FE   E7  
ÿ U+00FF FF FF FF 98 98 D8
Character Code point ISO-8859-1 ISO-8859-15 WINDOWS-1252 IBM437 IBM850 MACINTOSH
ı U+0131         D5 F5
Œ U+0152   BC 8C     CE
œ U+0153   BD 9C     CF
Š U+0160   A6 8A      
š U+0161   A8 9A      
Ÿ U+0178   BE 9F     D9
Ž U+017D   B4 8E      
ž U+017E   B8 9E      
ƒ U+0192     83 9F 9F C4
ˆ U+02C6     88     F6
ˇ U+02C7           FF
˘ U+02D8           F9
˙ U+02D9           FA
˚ U+02DA           FB
˛ U+02DB           FE
˜ U+02DC     98     F7
Character Code point ISO-8859-1 ISO-8859-15 WINDOWS-1252 IBM437 IBM850 MACINTOSH
˝ U+02DD           FD
Γ U+0393       E2    
Θ U+0398       E9    
Σ U+03A3       E4    
Φ U+03A6       E8    
Ω U+03A9       EA   BD
α U+03B1       E0    
δ U+03B4       EB    
ε U+03B5       EE    
π U+03C0       E3   B9
σ U+03C3       E5    
τ U+03C4       E7    
φ U+03C6       ED    
U+2013     96     D0
U+2014     97     D1
U+2017         F2  
Character Code point ISO-8859-1 ISO-8859-15 WINDOWS-1252 IBM437 IBM850 MACINTOSH
U+2018     91     D4
U+2019     92     D5
U+201A     82     E2
U+201C     93     D2
U+201D     94     D3
U+201E     84     E3
U+2020     86     A0
U+2021     87     E0
U+2022     95     A5
U+2026     85     C9
U+2030     89     E4
U+2039     8B     DC
U+203A     9B     DD
U+2044           DA
U+207F       FC    
U+20A7       9E    
Character Code point ISO-8859-1 ISO-8859-15 WINDOWS-1252 IBM437 IBM850 MACINTOSH
U+20AC   A4 80   (D5)[nb 1][2][3] DB
U+2122     99     AA
U+2202           B6
U+2206           C6
U+220F           B8
U+2211           B7
U+2219       F9    
U+221A       FB   C3
U+221E       EC   B0
U+2229       EF    
U+222B           BA
U+2248       F7   C5
U+2260           AD
U+2261       F0    
U+2264       F3   B2
U+2265       F2   B3
Character Code point ISO-8859-1 ISO-8859-15 WINDOWS-1252 IBM437 IBM850 MACINTOSH
U+2310       A9    
U+2320       F4    
U+2321       F5    
U+2500       C4 C4  
U+2502       B3 B3  
U+250C       DA DA  
U+2510       BF BF  
U+2514       C0 C0  
U+2518       D9 D9  
U+251C       C3 C3  
U+2524       B4 B4  
U+252C       C2 C2  
U+2534       C1 C1  
U+253C       C5 C5  
U+2550       CD CD  
U+2551       BA BA  
Character Code point ISO-8859-1 ISO-8859-15 WINDOWS-1252 IBM437 IBM850 MACINTOSH
U+2552       D5    
U+2553       D6    
U+2554       C9 C9  
U+2555       B8    
U+2556       B7    
U+2557       BB BB  
U+2558       D4    
U+2559       D3    
U+255A       C8 C8  
U+255B       BE    
U+255C       BD    
U+255D       BC BC  
U+255E       C6    
U+255F       C7    
U+2560       CC CC  
U+2561       B5    
Character Code point ISO-8859-1 ISO-8859-15 WINDOWS-1252 IBM437 IBM850 MACINTOSH
U+2562       B6    
U+2563       B9 B9  
U+2564       D1    
U+2565       D2    
U+2566       CB CB  
U+2567       CF    
U+2568       D0    
U+2569       CA CA  
U+256A       D8    
U+256B       D7    
U+256C       CE CE  
U+2580       DF DF  
U+2584       DC DC  
U+2588       DB DB  
U+258C       DD    
U+2590       DE    
Character Code point ISO-8859-1 ISO-8859-15 WINDOWS-1252 IBM437 IBM850 MACINTOSH
U+2591       B0 B0  
U+2592       B1 B1  
U+2593       B2 B2  
U+25A0       FE FE  
U+25CA           D7
U+FB01           DE
U+FB02           DF
  • The mappings for the IBM code pages are from the Unicode site supplied by Microsoft.[citation needed] The Unicode Consortium's document has links to sources giving the differences between IBM's and Microsoft's mappings for these code pages.[4]
  • IBM437 and IBM850 defined printable characters for the control code ranges. While these could not be used when printing text through DOS, as they would be trapped before reaching the screen, they could be used by applications that used screen memory directly.
  • Macintosh has an Apple logo ⟨⟩ at 0xF0, and translates it to U+F8FF in the Private Use Area for Unicode.

Notes edit

  1. ^ IBM's PC DOS 2000, released in 1998, changed their definition of code page 850 to what they called modified code page 850 now including the euro sign at code point 213 instead of adding support for the new code page 858. The reason for this might have been down to existing restrictions in the implementation of the codepage switching logic under MS-DOS/PC DOS, which limited .CPI files to 64 KB in size or about six codepages maximum, a limitation, which was circumvented in some OEM versions of MS-DOS, in Windows NT, and also does not exist in DR-DOS. Further, the parser in MS-DOS/PC DOS limits the number of possible country / codepage entries in COUNTRY.SYS files to a maximum of 146 or 438, a limitation non-existent in DR-DOS. So, adding support for codepage 858 might have meant to drop another (e.g. codepage 850) at the same time, which might not have been a viable solution at that time, given that some applications were hard-wired to use codepage 850.

References edit

  1. ^ . Code pages by CPGID. IBM. Archived from the original on 2016-06-06. Retrieved 2016-06-06.
  2. ^ Paul, Matthias R. (2001-08-15). (Technical design specification based on fd-dev post [1]). Archived from the original on 2016-06-06. Retrieved 2016-06-06. The new official ID for the Multilingual "codepage 850 with EURO SIGN" is 858, not 850. IBM will switch to use 858 instead of their 850 variant with future issues of their products. [...] I can only guess why they didn't add 858 to their EGAx.CPI, COUNTRY.SYS, and KEYBOARD.SYS files in PC DOS 2000. Many third-party applications are designed to work with 850 and didn't know about 858 at the time PC DOS 2000 was released, so it's easier for everyone, but unfortunately it's not compatible. [...] As explained above, COUNTRY.SYS and KEYBOARD.SYS contain only two codepage entries for a given country in Western issues of DOS. (In Arabic and Hebrew issues there can be up to 8 codepages for one country, in theory there is no limit below the range of allowed codepages 1..65534). [...] The problem is that removing support for 850 might have caused compatibility problems with applications which are hard-wired to use 850. Adding 858 as a third choice to all the files would have increased the file and table sizes significantly. The COUNTRY.SYS file parser in MS-DOS/PC DOS IO.SYS/IBMBIO.COM sets aside a 6 Kb (for DOS 6) scratchpad to load all the info. This allows a maximum of 438 entries in a COUNTRY.SYS file to be accepted, otherwise you will get the message "COUNTRY.SYS too large.". The NLSFUNC parser does not have this limitation, and the file parsers in DR-DOS (kernel and NLSFUNC) also do not know of such a restriction. Older issues of MS-DOS/PC DOS even had a 2 Kb buffer for a maximum of 146 entries. {{cite web}}: External link in |type= (help)
  3. ^ Paul, Matthias R. (2001-08-27). "Changing codepages in FreeDOS (follow-up)". Archived from the original on 2014-10-01. Retrieved 2013-05-08. [...] one could also create custom .CPI files in the traditional FONT style without difficulties, but you could only store up to [...] six codepages in such a file if it should be useable by MS-DOS/PC DOS (some OEM issues and NT can handle files larger than 64 Kb, but MS-DOS/PC DOS can not).
  4. ^ "IBM Conversion Mapping Tables". Unicode Consortium.

western, latin, character, sets, computing, several, character, sets, encodings, were, designed, binary, representation, common, western, european, languages, italian, spanish, portuguese, french, german, dutch, english, danish, swedish, norwegian, icelandic, . Several 8 bit character sets encodings were designed for binary representation of common Western European languages Italian Spanish Portuguese French German Dutch English Danish Swedish Norwegian and Icelandic which use the Latin alphabet a few additional letters and ones with precomposed diacritics some punctuation and various symbols including some Greek letters These character sets also happen to support many other languages such as Malay Swahili and Classical Latin This material is technically obsolete having been functionally replaced by Unicode However it continues to have historical interest Contents 1 Summary 2 History 2 1 The euro sign 3 Comparison table 4 Notes 5 ReferencesSummary editThe ISO 8859 series of 8 bit character sets encodes all Latin character sets used in Europe albeit that the same code points have multiple uses that caused some difficulty including mojibake or garbled characters and communication issues The arrival of Unicode with a unique code point for every glyph resolved these issues ISO IEC 8859 1 or Latin 1 is the most used and also defines the first 256 codepoints in Unicode ISO IEC 8859 15 modifies ISO 8859 1 to fully support Estonian Finnish and French and add the euro sign Windows 1252 is a superset of ISO 8859 1 that includes the printable characters from ISO IEC 8859 15 and popular punctuation such as curved quotation marks also known as smart quotes such as in Microsoft Word settings and similar programs It is common that web page tools for Windows use Windows 1252 but label the web page as using ISO 8859 1 this has been addressed in HTML5 which mandates that pages labeled as ISO 8859 1 must be interpreted as Windows 1252 IBM CP437 being intended for English only has very little in the way of accented letters particularly uppercase but has far more graphics characters than the other IBM code pages listed here and also some mathematical and Greek characters that are useful as technical symbols IBM CP850 has all the printable characters that ISO 8859 1 has albeit arranged differently and still manages to have enough graphics characters to build a usable text mode user interface IBM CP858 differs from CP850 only by one character a dotless i i rarely used outside Turkey and with no uppercase equivalent provided was replaced by euro currency sign 1 IBM CP859 contains all the printable characters that ISO IEC 8859 15 has so unlike CP850 it supports the euro sign Estonian Finnish and French IBM code pages 037 500 and 1047 are EBCDIC encodings that include all of the ISO 8859 1 characters The Mac OS Roman character set often referred to as MacRoman and known by the IANA as simply MACINTOSH has most but not all of the same characters as ISO IEC 8859 1 but in a very different arrangement and it also adds many technical and mathematical characters though it lacks the important multiplication sign and more diacritics Older Macintosh web browsers were known to munge the few characters that were in ISO IEC 8859 1 but not their native Macintosh character set when editing text from Web sites Conversely in Web material prepared on an older Macintosh many characters were displayed incorrectly when read by other operating systems The Macintosh Latin encoding a modification of Mac OS Roman to support ISO IEC 8859 1 was created by the creators of Kermit protocol to solve this problem History editThis section does not cite any sources Please help improve this section by adding citations to reliable sources Unsourced material may be challenged and removed April 2020 Learn how and when to remove this template message The earlier seven bit U S American Standard Code for Information Interchange ASCII encoding has characters sufficient to properly represent only a few languages such as English Latin Malay and Swahili It is missing some letters and letter diacritic combinations used in other Latin alphabet languages However since there was no other choice on most US supplied computer platforms use of ASCII was unavoidable except where there was a strong national computing industry There was the ISO 646 group of encodings which replaced some of the symbols in ASCII with local characters but space was very limited and some of the symbols replaced were quite common in things like programming languages Most computers internally used eight bit bytes but communication seen as inherently unreliable used seven data bits plus one parity bit In time it became common to use all eight bits for data creating space for another 128 characters In the early days most of these were system specific but gradually the ISO IEC 8859 standards emerged to provide some cross platform similarity to enable information interchange Towards the end of the 20th century as storage and memory costs fell the issues associated with multiple meanings of a given eight bit code there are seven ISO Latin code sets alone have ceased to be justified All major operating systems have moved to Unicode as their main internal representation However as Windows did not support the UTF 8 method of encoding Unicode preferring UTF 16 many applications continued to be restricted to these legacy character sets The euro sign edit The introduction of the euro and its associated euro sign introduced significant pressure on computer systems developers to support this new symbol and most 8 bit character sets had to be adapted in some way Apple with MacRoman and Sun Microsystems with Solaris OS simply replaced the generic currency sign This caused difficulty in some places because organisations had found other uses for its code point such as the company logo ISO introduced a further variant of ISO 8859 ISO 8859 15 which replaced the generic currency sign with the euro sign as well as making some other replacements of symbols with letters with diacritics ISO 8859 15 never received widespread adoption With Windows 1252 Microsoft placed the euro sign in a gap position 80hex in the existing C1 control codes a decision that other vendors considered counter architectural Whilst these decisions had limited effect for documents that were only used within a single computer or at least within a single vendor s digital ecosystem it meant that documents containing a euro sign would fail to render as expected when interchanged between ecosystems All of these issues have been resolved as operating systems have been upgraded to support Unicode as standard which encodes the euro sign at U 20AC decimal 8364 Comparison table editCode points U 0000 to U 007F are not shown in this table currently as they are directly mapped in all character sets listed here The ASCII coding standard defines the original specification for the mapping of the first 0 127 characters The table is arranged by Unicode code point Character sets are referred to here by their IANA names in upper case Character Code point ISO 8859 1 ISO 8859 15 WINDOWS 1252 IBM437 IBM850 MACINTOSHNBSP U 00A0 A0 A0 A0 FF FF CA U 00A1 A1 A1 A1 AD AD C1 U 00A2 A2 A2 A2 9B BD A2 U 00A3 A3 A3 A3 9C 9C A3 U 00A4 A4 A4 CF U 00A5 A5 A5 A5 9D BE B4 U 00A6 A6 A6 DD U 00A7 A7 A7 A7 F5 A4 U 00A8 A8 A8 F9 AC c U 00A9 A9 A9 A9 B8 A9ª U 00AA AA AA AA A6 A6 BB U 00AB AB AB AB AE AE C7 U 00AC AC AC AC AA AA C2SHY U 00AD AD AD AD F0 U 00AE AE AE AE A9 A8 U 00AF AF AF AF EE F8Character Code point ISO 8859 1 ISO 8859 15 WINDOWS 1252 IBM437 IBM850 MACINTOSH U 00B0 B0 B0 B0 F8 F8 A1 U 00B1 B1 B1 B1 F1 F1 B1 U 00B2 B2 B2 B2 FD FD U 00B3 B3 B3 B3 FC U 00B4 B4 B4 EF ABµ U 00B5 B5 B5 B5 E6 E6 B5 U 00B6 B6 B6 B6 F4 A6 U 00B7 B7 B7 B7 FA FA E1 U 00B8 B8 B8 F7 FC U 00B9 B9 B9 B9 FB º U 00BA BA BA BA A7 A7 BC U 00BB BB BB BB AF AF C8 U 00BC BC BC AC AC U 00BD BD BD AB AB U 00BE BE BE F3 U 00BF BF BF BF A8 A8 C0Character Code point ISO 8859 1 ISO 8859 15 WINDOWS 1252 IBM437 IBM850 MACINTOSHA U 00C0 C0 C0 C0 B7 CBA U 00C1 C1 C1 C1 B5 E7A U 00C2 C2 C2 C2 B6 E5A U 00C3 C3 C3 C3 C7 CCA U 00C4 C4 C4 C4 8E 8E 80A U 00C5 C5 C5 C5 8F 8F 81AE U 00C6 C6 C6 C6 92 92 AEC U 00C7 C7 C7 C7 80 80 82E U 00C8 C8 C8 C8 D4 E9E U 00C9 C9 C9 C9 90 90 83E U 00CA CA CA CA D2 E6E U 00CB CB CB CB D3 E8I U 00CC CC CC CC DE EDI U 00CD CD CD CD D6 EAI U 00CE CE CE CE D7 EBI U 00CF CF CF CF D8 ECCharacter Code point ISO 8859 1 ISO 8859 15 WINDOWS 1252 IBM437 IBM850 MACINTOSHD U 00D0 D0 D0 D0 D1 N U 00D1 D1 D1 D1 A5 A5 84O U 00D2 D2 D2 D2 E3 F1o U 00D3 D3 D3 D3 E0 EEO U 00D4 D4 D4 D4 E2 EFO U 00D5 D5 D5 D5 E5 CDO U 00D6 D6 D6 D6 99 99 85 U 00D7 D7 D7 D7 9E O U 00D8 D8 D8 D8 9D AFU U 00D9 D9 D9 D9 EB F4U U 00DA DA DA DA E9 F2U U 00DB DB DB DB EA F3U U 00DC DC DC DC 9A 9A 86Y U 00DD DD DD DD ED TH U 00DE DE DE DE E8 ss U 00DF DF DF DF E1 E1 A7Character Code point ISO 8859 1 ISO 8859 15 WINDOWS 1252 IBM437 IBM850 MACINTOSHa U 00E0 E0 E0 E0 85 85 88a U 00E1 E1 E1 E1 A0 A0 87a U 00E2 E2 E2 E2 83 83 89a U 00E3 E3 E3 E3 C6 8Ba U 00E4 E4 E4 E4 84 84 8Aa U 00E5 E5 E5 E5 86 86 8Cae U 00E6 E6 E6 E6 91 91 BEc U 00E7 E7 E7 E7 87 87 8De U 00E8 E8 E8 E8 8A 8A 8Fe U 00E9 E9 E9 E9 82 82 8Ee U 00EA EA EA EA 88 88 90e U 00EB EB EB EB 89 89 91i U 00EC EC EC EC 8D 8D 93i U 00ED ED ED ED A1 A1 92i U 00EE EE EE EE 8C 8C 94i U 00EF EF EF EF 8B 8B 95Character Code point ISO 8859 1 ISO 8859 15 WINDOWS 1252 IBM437 IBM850 MACINTOSHd U 00F0 F0 F0 F0 D0 n U 00F1 F1 F1 F1 A4 A4 96o U 00F2 F2 F2 F2 95 95 98o U 00F3 F3 F3 F3 A2 A2 97o U 00F4 F4 F4 F4 93 93 99o U 00F5 F5 F5 F5 E4 9Bo U 00F6 F6 F6 F6 94 94 9A U 00F7 F7 F7 F7 F6 F6 D6o U 00F8 F8 F8 F8 9B BFu U 00F9 F9 F9 F9 97 97 9Du U 00FA FA FA FA A3 A3 9Cu U 00FB FB FB FB 96 96 9Eu U 00FC FC FC FC 81 81 9Fy U 00FD FD FD FD EC th U 00FE FE FE FE E7 y U 00FF FF FF FF 98 98 D8Character Code point ISO 8859 1 ISO 8859 15 WINDOWS 1252 IBM437 IBM850 MACINTOSHi U 0131 D5 F5Œ U 0152 BC 8C CEœ U 0153 BD 9C CFS U 0160 A6 8A s U 0161 A8 9A Ÿ U 0178 BE 9F D9Z U 017D B4 8E z U 017E B8 9E ƒ U 0192 83 9F 9F C4ˆ U 02C6 88 F6ˇ U 02C7 FF U 02D8 F9 U 02D9 FA U 02DA FB U 02DB FE U 02DC 98 F7Character Code point ISO 8859 1 ISO 8859 15 WINDOWS 1252 IBM437 IBM850 MACINTOSH U 02DD FDG U 0393 E2 8 U 0398 E9 S U 03A3 E4 F U 03A6 E8 W U 03A9 EA BDa U 03B1 E0 d U 03B4 EB e U 03B5 EE p U 03C0 E3 B9s U 03C3 E5 t U 03C4 E7 f U 03C6 ED U 2013 96 D0 U 2014 97 D1 U 2017 F2 Character Code point ISO 8859 1 ISO 8859 15 WINDOWS 1252 IBM437 IBM850 MACINTOSH U 2018 91 D4 U 2019 92 D5 U 201A 82 E2 U 201C 93 D2 U 201D 94 D3 U 201E 84 E3 U 2020 86 A0 U 2021 87 E0 U 2022 95 A5 U 2026 85 C9 U 2030 89 E4 U 2039 8B DC U 203A 9B DD U 2044 DAⁿ U 207F FC U 20A7 9E Character Code point ISO 8859 1 ISO 8859 15 WINDOWS 1252 IBM437 IBM850 MACINTOSH U 20AC A4 80 D5 nb 1 2 3 DB U 2122 99 AA U 2202 B6 U 2206 C6 U 220F B8 U 2211 B7 U 2219 F9 U 221A FB C3 U 221E EC B0 U 2229 EF U 222B BA U 2248 F7 C5 U 2260 AD U 2261 F0 U 2264 F3 B2 U 2265 F2 B3Character Code point ISO 8859 1 ISO 8859 15 WINDOWS 1252 IBM437 IBM850 MACINTOSH U 2310 A9 U 2320 F4 U 2321 F5 U 2500 C4 C4 U 2502 B3 B3 U 250C DA DA U 2510 BF BF U 2514 C0 C0 U 2518 D9 D9 U 251C C3 C3 U 2524 B4 B4 U 252C C2 C2 U 2534 C1 C1 U 253C C5 C5 U 2550 CD CD U 2551 BA BA Character Code point ISO 8859 1 ISO 8859 15 WINDOWS 1252 IBM437 IBM850 MACINTOSH U 2552 D5 U 2553 D6 U 2554 C9 C9 U 2555 B8 U 2556 B7 U 2557 BB BB U 2558 D4 U 2559 D3 U 255A C8 C8 U 255B BE U 255C BD U 255D BC BC U 255E C6 U 255F C7 U 2560 CC CC U 2561 B5 Character Code point ISO 8859 1 ISO 8859 15 WINDOWS 1252 IBM437 IBM850 MACINTOSH U 2562 B6 U 2563 B9 B9 U 2564 D1 U 2565 D2 U 2566 CB CB U 2567 CF U 2568 D0 U 2569 CA CA U 256A D8 U 256B D7 U 256C CE CE U 2580 DF DF U 2584 DC DC U 2588 DB DB U 258C DD U 2590 DE Character Code point ISO 8859 1 ISO 8859 15 WINDOWS 1252 IBM437 IBM850 MACINTOSH U 2591 B0 B0 U 2592 B1 B1 U 2593 B2 B2 U 25A0 FE FE U 25CA D7fi U FB01 DEfl U FB02 DFThe mappings for the IBM code pages are from the Unicode site supplied by Microsoft citation needed The Unicode Consortium s document has links to sources giving the differences between IBM s and Microsoft s mappings for these code pages 4 IBM437 and IBM850 defined printable characters for the control code ranges While these could not be used when printing text through DOS as they would be trapped before reaching the screen they could be used by applications that used screen memory directly Macintosh has an Apple logo at 0xF0 and translates it to U F8FF in the Private Use Area for Unicode Notes edit IBM s PC DOS 2000 released in 1998 changed their definition of code page 850 to what they called modified code page 850 now including the euro sign at code point 213 instead of adding support for the new code page 858 The reason for this might have been down to existing restrictions in the implementation of the codepage switching logic under MS DOS PC DOS which limited CPI files to 64 KB in size or about six codepages maximum a limitation which was circumvented in some OEM versions of MS DOS in Windows NT and also does not exist in DR DOS Further the parser in MS DOS PC DOS limits the number of possible country codepage entries in COUNTRY SYS files to a maximum of 146 or 438 a limitation non existent in DR DOS So adding support for codepage 858 might have meant to drop another e g codepage 850 at the same time which might not have been a viable solution at that time given that some applications were hard wired to use codepage 850 References edit 00858 Code pages by CPGID IBM Archived from the original on 2016 06 06 Retrieved 2016 06 06 Paul Matthias R 2001 08 15 Changing codepages in FreeDOS Technical design specification based on fd dev post 1 Archived from the original on 2016 06 06 Retrieved 2016 06 06 The new official ID for the Multilingual codepage 850 with EURO SIGN is 858 not 850 IBM will switch to use 858 instead of their 850 variant with future issues of their products I can only guess why they didn t add 858 to their EGAx CPI COUNTRY SYS and KEYBOARD SYS files in PC DOS 2000 Many third party applications are designed to work with 850 and didn t know about 858 at the time PC DOS 2000 was released so it s easier for everyone but unfortunately it s not compatible As explained above COUNTRY SYS and KEYBOARD SYS contain only two codepage entries for a given country in Western issues of DOS In Arabic and Hebrew issues there can be up to 8 codepages for one country in theory there is no limit below the range of allowed codepages 1 65534 The problem is that removing support for 850 might have caused compatibility problems with applications which are hard wired to use 850 Adding 858 as a third choice to all the files would have increased the file and table sizes significantly The COUNTRY SYS file parser in MS DOS PC DOS IO SYS IBMBIO COM sets aside a 6 Kb for DOS 6 scratchpad to load all the info This allows a maximum of 438 entries in a COUNTRY SYS file to be accepted otherwise you will get the message COUNTRY SYS too large The NLSFUNC parser does not have this limitation and the file parsers in DR DOS kernel and NLSFUNC also do not know of such a restriction Older issues of MS DOS PC DOS even had a 2 Kb buffer for a maximum of 146 entries a href Template Cite web html title Template Cite web cite web a External link in code class cs1 code type code help Paul Matthias R 2001 08 27 Changing codepages in FreeDOS follow up Archived from the original on 2014 10 01 Retrieved 2013 05 08 one could also create custom CPI files in the traditional FONT style without difficulties but you could only store up to six codepages in such a file if it should be useable by MS DOS PC DOS some OEM issues and NT can handle files larger than 64 Kb but MS DOS PC DOS can not IBM Conversion Mapping Tables Unicode Consortium Retrieved from https en wikipedia org w index php title Western Latin character sets computing amp oldid 1167432348, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.