fbpx
Wikipedia

Precomposed character

A precomposed character (alternatively composite character or decomposable character) is a Unicode entity that can also be defined as a sequence of one or more other characters. A precomposed character may typically represent a letter with a diacritical mark, such as é (Latin small letter e with acute accent). Technically, é (U+00E9) is a character that can be decomposed into an equivalent string of the base letter e (U+0065) and combining acute accent (U+0301). Similarly, ligatures are precompositions of their constituent letters or graphemes.

Precomposed characters are the legacy solution for representing many special letters in various character sets. In Unicode, they are included primarily to aid computer systems with incomplete Unicode support, where equivalent decomposed characters may render incorrectly.

Comparing precomposed and decomposed characters edit

In the following example, there is a common Swedish surname Åström written in the two alternative methods, the first one with a precomposed Å (U+00C5) and ö (U+00F6), and the second one using a decomposed base letter A (U+0041) with a combining ring above (U+030A) and an o (U+006F) with a combining diaeresis (U+0308).

  1. Åström (U+00C5 U+0073 U+0074 U+0072 U+00F6 U+006D)
  2. Åström (U+0041 U+030A U+0073 U+0074 U+0072 U+006F U+0308 U+006D)

Except for the different colors, the two solutions are equivalent and should render identically. In practice, however, some Unicode implementations still have difficulties with decomposed characters. In the worst case, combining diacritics may be disregarded or rendered as unrecognized characters after their base letters, as they are not included in all fonts. To overcome the problems, some applications may simply attempt to replace the decomposed characters with the equivalent precomposed characters.

With an incomplete font, however, precomposed characters may also be problematic – especially if they are more exotic, as in the following example (showing the reconstructed Proto-Indo-European word for "dog"):

  1. ḱṷṓn (U+1E31 U+1E77 U+1E53 U+006E)
  2. ḱṷṓn (U+006B U+0301 U+0075 U+032D U+006F U+0304 U+0301 U+006E)

In some situations, the precomposed green k, u and o with diacritics may render as unrecognized characters, or their typographical appearance may be very different from the final letter n with no diacritic. On the second line, the base letters should at least render correctly even if the combining diacritics could not be recognized.

OpenType has the ccmp "feature tag" to define glyphs that are compositions or decompositions involving combining characters.

Chinese characters edit

In theory, most Chinese characters as encoded by Han unification and similar schemes could be treated as precomposed characters, since they can be reduced (decomposed) to their constituent radical and phonetic components with Chinese character description languages. Such an approach could reduce the number of characters in the character set from tens of thousands to just a few thousand. On the other hand, a decomposed character set would introduce challenges for searching and editing software and require more bytes of encoding per document. One particular challenge would be the multiple-to-multiple projections between the set of decomposed characters and the precomposed character—one precomposed character may be decomposed into multiple different sets of decomposed characters while one set of decomposed characters could contract themselves into multiple different precomposed characters. There is no strict requirement or constraints regarding the relative position between components within a character, the form of variant and transform (narrow, widen, stretch, rotate, etc.) applied on components, nor the number of each components.

See also edit

Sources edit

  • The Unicode Standard, Version 5.2: Conformance (see Section 3.7 for Decomposition). The Unicode Consortium, December 2009.
  • MSDN: Defining a Character Set. April 8, 2010.
  • Unicode Normalization Forms (Unicode® Standard Annex #15): http://unicode.org/reports/tr15/

External links edit

  • , a derivative of the FreeSerif font with added declarations of precomposed characters.

precomposed, character, precomposed, character, alternatively, composite, character, decomposable, character, unicode, entity, that, also, defined, sequence, more, other, characters, precomposed, character, typically, represent, letter, with, diacritical, mark. A precomposed character alternatively composite character or decomposable character is a Unicode entity that can also be defined as a sequence of one or more other characters A precomposed character may typically represent a letter with a diacritical mark such as e Latin small letter e with acute accent Technically e U 00E9 is a character that can be decomposed into an equivalent string of the base letter e U 0065 and combining acute accent U 0301 Similarly ligatures are precompositions of their constituent letters or graphemes Precomposed characters are the legacy solution for representing many special letters in various character sets In Unicode they are included primarily to aid computer systems with incomplete Unicode support where equivalent decomposed characters may render incorrectly Contents 1 Comparing precomposed and decomposed characters 2 Chinese characters 3 See also 4 Sources 5 External linksComparing precomposed and decomposed characters editIn the following example there is a common Swedish surname Astrom written in the two alternative methods the first one with a precomposed A U 00C5 and o U 00F6 and the second one using a decomposed base letter A U 0041 with a combining ring above U 030A and an o U 006F with a combining diaeresis U 0308 A stro m U 00C5 U 0073 U 0074 U 0072 U 00F6 U 006D A stro m U 0041 U 030A U 0073 U 0074 U 0072 U 006F U 0308 U 006D Except for the different colors the two solutions are equivalent and should render identically In practice however some Unicode implementations still have difficulties with decomposed characters In the worst case combining diacritics may be disregarded or rendered as unrecognized characters after their base letters as they are not included in all fonts To overcome the problems some applications may simply attempt to replace the decomposed characters with the equivalent precomposed characters With an incomplete font however precomposed characters may also be problematic especially if they are more exotic as in the following example showing the reconstructed Proto Indo European word for dog ḱṷṓ n U 1E31 U 1E77 U 1E53 U 006E k u o n U 006B U 0301 U 0075 U 032D U 006F U 0304 U 0301 U 006E In some situations the precomposed green k u and o with diacritics may render as unrecognized characters or their typographical appearance may be very different from the final letter n with no diacritic On the second line the base letters should at least render correctly even if the combining diacritics could not be recognized OpenType has the ccmp feature tag to define glyphs that are compositions or decompositions involving combining characters Chinese characters editIn theory most Chinese characters as encoded by Han unification and similar schemes could be treated as precomposed characters since they can be reduced decomposed to their constituent radical and phonetic components with Chinese character description languages Such an approach could reduce the number of characters in the character set from tens of thousands to just a few thousand On the other hand a decomposed character set would introduce challenges for searching and editing software and require more bytes of encoding per document One particular challenge would be the multiple to multiple projections between the set of decomposed characters and the precomposed character one precomposed character may be decomposed into multiple different sets of decomposed characters while one set of decomposed characters could contract themselves into multiple different precomposed characters There is no strict requirement or constraints regarding the relative position between components within a character the form of variant and transform narrow widen stretch rotate etc applied on components nor the number of each components See also editList of precomposed Latin characters in Unicode Dead key Compose key Combining character Unicode equivalence Complex text layout Unicode compatibility characters Alphabetic Presentation Forms Unicode block Arabic Presentation Forms A Unicode block Arabic Presentation Forms B Unicode block Sources editThe Unicode Standard Version 5 2 Conformance see Section 3 7 for Decomposition The Unicode Consortium December 2009 MSDN Defining a Character Set April 8 2010 Unicode Normalization Forms Unicode Standard Annex 15 http unicode org reports tr15 External links editFree Idg Serif a derivative of the FreeSerif font with added declarations of precomposed characters Retrieved from https en wikipedia org w index php title Precomposed character amp oldid 1196805501, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.