fbpx
Wikipedia

Private Use Areas

In Unicode, a Private Use Area (PUA) is a range of code points that, by definition, will not be assigned characters by the Unicode Consortium.[1] Three private use areas are defined: one in the Basic Multilingual Plane (U+E000–U+F8FF), and one each in, and nearly covering, planes 15 and 16 (U+F0000–U+FFFFD, U+100000–U+10FFFD). The code points in these areas cannot be considered as standardized characters in Unicode itself. They are intentionally left undefined so that third parties may define their own characters without conflicting with Unicode Consortium assignments. Under the Unicode Stability Policy,[2] the Private Use Areas will remain allocated for that purpose in all future Unicode versions.

Assignments to Private Use Area characters need not be private in the sense of strictly internal to an organisation; a number of assignment schemes have been published by several organisations. Such publication may include a font that supports the definition (showing the glyphs), and software making use of the private-use characters (e.g. a graphics character for a "print document" function). By definition, multiple private parties may assign different characters to the same code point, with the consequence that a user may see one private character from an installed font where a different one was intended.

Definition edit

Under the Unicode definition, code points in the Private Use Areas are assigned characters—they are not noncharacters, reserved, or unassigned. Their category is "Other, private use (Co)", and no character names are specified. No representative glyphs are provided, and character semantics are left to private agreement.

Private-use characters are assigned Unicode code points whose interpretation is not specified by this standard and whose use may be determined by private agreement among cooperating users. These characters are designated for private use and do not have defined, interpretable semantics except by private agreement.

...

No charts are provided for private-use characters, as any such characters are, by their very nature, defined only outside the context of this standard.[3]

Assignment edit

In the Basic Multilingual Plane (plane 0), the block titled Private Use Area has 6400 code points.

Planes 15 and 16 are almost[note 1] entirely assigned to two further Private Use Areas, Supplementary Private Use Area-A and Supplementary Private Use Area-B respectively. In UTF-16 a subset of the high surrogates (U+DB80..U+DBFF) is used for these and only these planes, and are called High Private Use Surrogates.

Unicode: Private Use Areas
Definition by character property: General Category=Co[a][b]
Range Plane Block name Number of code points Note
U+E000..U+F8FF BMP (0) Private Use Area 6,400
U+F0000..U+FFFFD[c] PUP (15)[d] Supplementary Private Use Area-A 65,534 UTF-16 encodes these characters using codepoints from the block High Private Use Surrogates (U+DB80..U+DBFF) in the BMP.
U+100000..U+10FFFD[c] PUP (16)[d] Supplementary Private Use Area-B 65,534
Notes
  1. ^
    Unicode 15.1 Data
  2. ^
    The Unicode Standard, Section 23.5: Private-Use Characters
  3. ^
    Code points U+FFFFE, U+FFFFF, U+10FFFE, and U+10FFFF are noncharacters, not private-use characters.
  4. ^
    Private Use Plane: Unicode has not published identifying names for planes 15 and 16. Chapter 2.8 says The two Private Use Planes (Planes 15 and 16), while the PUA block names used are Supplementary PUA-A and Supplementary PUA-B.

Unicode PUA blocks edit

There are three PUA blocks in Unicode.[3]

Private Use Area
RangeU+E000..U+F8FF
(6,400 code points)
PlaneBMP
ScriptsUnknown
Assigned6,400 code points
Unused0 reserved code points
Unicode version history
1.0.0 (1991)5,632 (+5,632)
1.0.1 (1992)6,400 (+768)
Unicode documentation
Code chart ∣ Web page
Note: Version 1.0.1 moved and expanded the Private Use Area block (previously located at U+E800-U+FDFF in version 1.0.0).[4][5][6]
Supplementary Private Use Area-A
RangeU+F0000..U+FFFFF
(65,536 code points)
PlaneSPUA-A
ScriptsUnknown
Assigned65,534 code points
Unused0 reserved code points
2 non-characters
Unicode version history
2.0 (1996)65,534 (+65,534)
Chart
Code chart
Note: [5][6]
Supplementary Private Use Area-B
RangeU+100000..U+10FFFF
(65,536 code points)
PlaneSPUA-B
ScriptsUnknown
Assigned65,534 code points
Unused0 reserved code points
2 non-characters
Unicode version history
2.0 (1996)65,534 (+65,534)
Chart
Code chart
Note: [5][6]

History edit

In Unicode 1.0.0, the private use area extended from U+E800 to U+FDFF (i.e. did not include U+E000..E7FF, but additionally included the U+F900..FDFF range now occupied by CJK Compatibility Ideographs, Alphabetic Presentation Forms and Arabic Presentation Forms-A).[7] This was changed to U+E000..F8FF in Unicode 1.0.1,[4] and remained so in Unicode 1.1.[8] Contrary to misconception, the range U+D800..DFFF (reserved for UTF-16 surrogates since Unicode 2.0) was not included in the private use range of any Unicode 1.x version.

Historically, planes E0 (224) through FF (255), and groups 60 (96) though 7F (127) of the Universal Coded Character Set (i.e. U+E00000 through U+FFFFFF and U+60000000 through U+7FFFFFFF) were also designated as private use. These ranges were removed from the specified private-use ranges when the UCS was restricted to the seventeen planes reachable in UTF-16.[9]

Usage edit

Standardization initiative uses edit

Many people and institutions have created character collections for the PUA. Some of these private use agreements are published, so other PUA implementers can aim for unused or less used code points to prevent overlaps. Several characters and scripts previously encoded in private use agreements have actually been fully encoded in Unicode, necessitating mappings from the PUA to other Unicode code points.

One of the more well-known and broadly implemented PUA agreements is maintained by the ConScript Unicode Registry (CSUR). The CSUR, which is not officially endorsed or associated with the Unicode Consortium, provides a mapping for constructed scripts, such as Klingon pIqaD and Ferengi script (Star Trek), Tengwar and Cirth (J.R.R. Tolkien's cursive and runic scripts), Alexander Melville Bell's Visible Speech, and Dr. Seuss' alphabet from On Beyond Zebra. The CSUR previously encoded the undeciphered Phaistos characters, as well as the Shavian and Deseret alphabets, which have all been accepted for official encoding in Unicode.

Another common PUA agreement is maintained by the Medieval Unicode Font Initiative (MUFI). This project is attempting to support all of the scribal abbreviations, ligatures, precomposed characters, symbols, and alternate letterforms found in medieval texts written in the Latin alphabet. The express purpose of MUFI is to experimentally determine which characters are necessary to represent these texts, and to have those characters officially encoded in Unicode. As of Unicode version 5.1, 152 MUFI characters have been incorporated into the official Unicode encoding.[needs update]

Some agreed-upon PUA character collections exist in part or whole because the Unicode Consortium is in no hurry to encode them. Some, such as unrepresented languages, are likely to end up encoded in the future. Some unusual cases such as fictional languages are outside the usual scope of Unicode but not explicitly ruled out by the principles of Unicode, and may show up eventually (such as the Star Trek and Tolkien writing systems). In other cases, the proposed encoding violates one or more Unicode principles and hence is unlikely to ever be officially recognized by Unicode—mostly where users want to directly encode alternate forms, ligatures, or base-character-plus-diacritic combinations (such as the TUNE scheme).

Publishing organisation Topic PUA area used Font
CSUR Artificial and some ancient/medieval scripts PUA (BMP) and Plane 15 Code2000
MUFI Medieval scripts PUA (BMP) several
SIL Phonetics and languages PUA (BMP) Charis SIL
TITUS Ancient and medieval scripts PUA (BMP) TITUS Cyberbit Basic
  • Emoji is an encoding for picture characters or emoticons used in Japanese wireless messages and webpages. With Unicode 6.0 and later, many of these have been encoded in the block Miscellaneous Symbols And Pictographs and elsewhere in the SMP.
  • GB/T 20542-2006 ("Tibetan Coded Character Set Extension A") and GB/T 22238-2008 ("Tibetan Coded Character Set Extension B") are Chinese national standards that use the PUA to encode precomposed Tibetan ligatures.
  • GB 18030 and GBK use the PUA to provisionally encode characters not found in Unicode standards at the time of publication (most have been encoded since then).
  • The Institute of the Estonian Language uses the PUA to encode Latin and Cyrillic precomposed characters[10] that have no Unicode encoding.
  • The Free Tengwar Font Project uses a different mapping from the ConScript Unicode Registry that largely follows Michael Everson's 2001-03-07 Tengwar discussion paper, but diverges in some details.
  • The MARC 21 standard uses the PUA to encode East Asian characters present in MARC-8[11] that have no Unicode encoding.
  • The SIL Corporate PUA uses the PUA to encode characters used in minority languages that have not yet been accepted into Unicode.
  • The STIX Fonts project uses the PUA to provide a comprehensive font set of mathematical symbols and alphabets, many of which are also available in the SMP now, e.g. in the Mathematical Alphanumeric Symbols block.
  • The Tamil Unicode New Encoding (TUNE)[12] is a proposed scheme for encoding Tamil that overcomes perceived deficiencies in the current Unicode encoding.

Vendor use edit

Informally, the range U+F000 through U+F8FF is known as the Corporate Use Area. This originates from early versions of Unicode, which defined an "End User Zone" extending from U+E000 upward and a "Corporate Use Zone" extending from U+F8FF downward, with the boundary between the two left undefined.[8]

  • The Adobe Glyph List used to use the PUA for some of its glyphs.[13]
  • Apple lists a range of 1,280 characters in its developer documentation[14] from U+F400–U+F8FF within the PUA for Apple's use. Of those, only 311 are used, in the range U+F700–U+F8FF (NeXT (NeXTSTEP and OPENSTEP) and Apple (Mac OS X AppKit)).[15]
    • One of these is U+F8FF, the Apple logo, generally supported by Apple's 8-bit sets.
  • WGL4 uses the PUA (U+F001 and U+F002) to encode duplicates of the ligatures fi (U+FB01) fl (U+FB02).[16]
  • Microsoft's defunct Services For Macintosh feature used U+F001 through U+F029 as replacements for special characters allowed in HFS but forbidden in NTFS, and U+F02A for the Apple logo.[17][18]
  • In old versions of its RichEdit component, Microsoft mapped U+F020–U+F0FF within the PUA to symbol fonts. For any character in this range, RichEdit would show a character from a symbol font instead of the end-user-defined character (EUDC)[19][20]
  • AutoCAD[clarification needed] uses U+F8FC–U+F8FE for ⌀ (diameter sign), ± (plus–minus sign) and ° (degree sign) respectively.
  • Some fonts place Windows logo key at U+F000.
  • Number U+F000 is a numeral succession starting at 13 or 18 in some video games like Agar.io.
  • On Ubuntu, U+E0FF is displayed as the "Circle Of Friends" logo[21] and U+F200 is "ubuntu" in the Ubuntu typeface with a superscripted "Circle Of Friends" (this itself is U+F0FF).[22]
  • The 3270 font includes the Debian logo at U+F100
  • In the Linux Libertine font, U+E000 displays Tux, the mascot of Linux
  • The Font Awesome icon font uses the PUA to display various glyphs.
  • Powerline, a status line plugin for Vim, uses U+E0A0–U+E0A2 and U+E0B0–U+E0B3 for extra box-drawing characters.[23][24][25]
  • On the Fira Sans typeface used in Firefox OS, U+E003 is displayed as the Mozilla logo (the dinosaur head).
  • Lotus Multi-Byte Character Set (LMBCS), the encoding and character set internally used by Lotus/IBM Lotus 1-2-3, Symphony, SmartSuite, Notes, Domino as well as a number of third-party products such as Microsoft Works, uses some characters (U+F862-U+F89F and U+F8FB-U+F8FE) in the Private Use Area for symbols not defined in Unicode. Of these, U+F8FB is known to be reserved for a crown currency symbol ("Kr"), and U+F8FC and U+F8FD were later mapped to U+FB02 () and U+FB01 () respectively. Additionally, when UTF-16 codes are embedded in LMBCS, the UTF-16 codes corresponding to U+F601 through U+F6FF are substituted for UTF-16 codes which would contain null bytes, since LMBCS is designed to not contain embedded null bytes.[26][27]
  • IBM reserved several code page IDs for PUA code pages: code page 1446 for the generic plane 15, code page 1447 for the generic plane 16, code page 1448 for the generic BMP PUA, code page 1445 (IBM AFP PUA No. 1) for plane 15 with IBM allocations in U+FFF00–U+FFFFD,[28][29] and code page 1449 (IBM default PUA) for the BMP PUA with IBM allocations in U+F83D–U+F8FF.[30][31]
  • The file system found in Windows uses the U+F000 to U+F0FF block to escape special characters.
  • NetApp translates characters in filenames that are allowed on Unix but invalid for SMB clients to PUA characters.[32]
  • Twitter's Chirp font provides some additional icons, like U+E000 which corresponds to a left down arrow, U+EA00 which corresponds to the Twitter bird, and U+F8FF which corresponds to an Apple logo, possibly for compatibility with Apple fonts.[33]

Private-use characters in other character sets edit

The concept of reserving specific code points for Private Use is based on similar earlier usage in other character sets. In particular, many otherwise obsolete characters in East Asian scripts continue to be used in specific names or other situations, and so some character sets for those scripts made allowance for private-use characters (such as the user-defined planes of CNS 11643, or gaiji in certain Japanese encodings). The Unicode standard references these uses under the name "End User Character Definition" (EUCD).[3]

Additionally, the C1 control block contains two codes intended for private use "control functions" by ECMA-48: 0x91 private use one (PU1) and 0x92 private use two (PU2).[34][35] Unicode includes these at U+0091 <control-0091> and U+0092 <control-0092> but defines them as control characters (category Cc), not private-use characters (category Co).[5][36]

Encodings which do not have private use areas but have more or less unused areas, such as ISO/IEC 8859 and Shift JIS, have seen uncontrolled variants of these encodings evolve.[37] For Unicode, software companies can use the Private Use Areas for their desired additions.

Notes edit

  1. ^ The last two characters of every plane are defined to be noncharacters. The remaining 65,534 characters of each of planes 15 and 16 are assigned as private-use characters.

References edit

  1. ^ "Glossary of Unicode Terms: "Private Use Area (PUA)"". Unicode Consortium.
  2. ^ "Unicode Character Encoding Stability Policy". 2021-11-10. Retrieved 2022-03-03.
  3. ^ a b c "Chapter 23 Special Areas and Format Characters" (PDF). The Unicode Standard Version 14.0 - Core Specification. Private Use characters.
  4. ^ a b "Unicode 1.0.1" (PDF). The Unicode Standard. 1992-11-03. (PDF) from the original on 2016-07-02. Retrieved 2016-07-09.
  5. ^ a b c d "Unicode character database". The Unicode Standard. Retrieved 2023-07-26.
  6. ^ a b c "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2023-07-26.
  7. ^ "3.5: Private Use Area" (PDF). The Unicode Standard, Version 1.0, Volume 1. Unicode Consortium. 1991. pp. 118–119. ISBN 0-201-56788-1. (PDF) from the original on 2021-10-21. Retrieved 2021-10-11.
  8. ^ a b "2.0: Changes in Unicode 1.0" (PDF). The Unicode Standard, Version 1.1. Unicode Consortium. pp. 3–4. UTR #4. (PDF) from the original on 2021-11-20. Retrieved 2021-10-11.
  9. ^ Whistler, Ken (2000). "Necessary changes for ISO/IEC 10646 regarding the PUA". UTC/00-015. from the original on 2021-06-23. Retrieved 2021-01-30.
  10. ^ "Letter Database". Eki.ee. from the original on 2018-05-21. Retrieved 2013-04-11.
  11. ^ "Character Sets: East Asian Characters: Alternative Unicode Mappings for MARC 21 Characters Assigned to the Private Use Area (PUA): MARC 21 Specifications for Record Structure, Character Sets, and Exchange Media". Library of Congress. 2004-09-02. from the original on 2013-08-19. Retrieved 2013-04-11.
  12. ^ . tunerfc.tn.nic.in. Archived from the original on 2010-07-29. Retrieved 2013-04-11.
  13. ^ . October 22, 1998. Archived from the original on October 9, 2002. Retrieved May 12, 2021.
  14. ^ "NSOpenStepUnicodeReservedBase - Apple Developer Documentation". Apple Inc. from the original on 2020-11-06. Retrieved 2020-10-16.
  15. ^ Apple Computer, Inc. (2005) [1994]. "CORPCHAR.TXT - Registry (external version) of Apple use of Unicode corporate-zone characters". c03. Unicode Inc. from the original on 2020-10-30. Retrieved 2020-10-16.
  16. ^ . Microsoft. Archived from the original on 2014-07-17.
  17. ^ . Microsoft Support. February 24, 2014. Archived from the original on May 27, 2016.
  18. ^ "ntfs.util.c". 2008. from the original on 2018-08-07. Retrieved 2018-08-07. Invalid NTFS filename characters are encodeded [sic] using the SFM (Services for Macintosh) private use Unicode characters.
  19. ^ . Microsoft Knowledge Base. Archived from the original on 2012-10-22.
  20. ^ . SIL International. 2003-04-25. Archived from the original on 2015-05-11. Retrieved 2014-03-04.
  21. ^ "Comment #8 : Bug #651606 (circle-of-friends) : Bugs : Ubuntu Font Family". Launchpad. 5 October 2010. from the original on 2020-10-17. Retrieved 2020-10-17.
  22. ^ "Comment #2 : Bug #853855 : Bugs : Ubuntu Font Family". Launchpad. 26 September 2011. from the original on 2020-10-17. Retrieved 2020-10-17.
  23. ^ "Powerline status line plugin question on Stack Exchange mentioning private use area characters". from the original on 2015-03-12. Retrieved 2015-03-22.
  24. ^ "Pictures showing private use area characters in Powerline patched fonts". from the original on 2015-05-11. Retrieved 2015-03-22.
  25. ^ Li, Renzhi (2019-08-23). "Proposal to add additional characters into the Graphics for Legacy Computing block of the UCS" (PDF). Retrieved 2023-07-31.
  26. ^ "lmb-excp.ucm". GitHub. 2000-02-10. from the original on 2022-01-25. Retrieved 2020-04-23.
  27. ^ "Anhang 2. Der Lotus Multibyte Zeichensatz (LMBCS)" [Appendix 2. The Lotus Multibyte Character Set (LMBCS)]. Lotus 1-2-3 Version 3.1 Referenzhandbuch [Lotus 1-2-3 Version 3.1 Reference Manual] (in German) (1 ed.). Cambridge, Massachusetts, US: Lotus Development Corporation. 1989. pp. A2–1 – A2–13. 302168.
  28. ^ "CPGID 01445 (chart)" (PDF). REGISTRY: Graphic Character Sets and Code Pages. 2012 [2011]. C-H 3-3220-050. The area shown in the chart above represents only 254 bytes of row FF in plane 0F.
  29. ^ "CPGID 01445: IBM AFP PUA No. 1". REGISTRY: Graphic Character Sets and Code Pages. 2012 [2011]. C-H 3-3220-050. The area shown in the chart above represents only 254 bytes of row FF in plane 0F.
  30. ^ . IBM Globalization: Code page identifiers. IBM. Archived from the original on 2015-09-16. IBM has designated 195 positions from U+F83D to U+F8FF for use as IBM Corporate-zone and intends to use them consistently within IBM whenever there is a need to maintain the round-trip integrity of IBM characters.
  31. ^ IBM (1997). unicode.nam: Allow the Unicode characters to be specified using either the IBM or PostScript like names. (Included with Borgendale, Ken, OS/2 Codepage and Keyboard Display Tools)
  32. ^ "Configure character mapping for SMB file name translation on volumes". 9 December 2021. Retrieved 2022-10-14.
  33. ^ "Twitter Chirp Font". Copy Paste Dump. Retrieved 2022-02-08.
  34. ^ "Standard ECMA-48, Fifth Edition - June 1991" (PDF). §8.2.14 Miscellaneous control functions, §8.3.100, §8.3.101.
  35. ^ ISO/TC97/SC2 (1983-10-01). C1 Control Character Set of ISO 6429 (PDF). ITSCJ/IPSJ. ISO-IR-77.{{citation}}: CS1 maint: numeric names: authors list (link)
  36. ^ "Chapter 4 Character Properties" (PDF). The Unicode Standard Version 14.0 - Core Specification. Table 4-4.
  37. ^ "Map (external version) from Mac OS Japanese encoding to Unicode 2.1 and later". from the original on 2021-08-31. Retrieved 2021-10-08.

private, areas, this, article, about, unicode, range, codepoints, other, uses, private, area, disambiguation, unicode, private, area, range, code, points, that, definition, will, assigned, characters, unicode, consortium, three, private, areas, defined, basic,. This article is about the Unicode PUA range of codepoints For other uses see Private use area disambiguation In Unicode a Private Use Area PUA is a range of code points that by definition will not be assigned characters by the Unicode Consortium 1 Three private use areas are defined one in the Basic Multilingual Plane U E000 U F8FF and one each in and nearly covering planes 15 and 16 U F0000 U FFFFD U 100000 U 10FFFD The code points in these areas cannot be considered as standardized characters in Unicode itself They are intentionally left undefined so that third parties may define their own characters without conflicting with Unicode Consortium assignments Under the Unicode Stability Policy 2 the Private Use Areas will remain allocated for that purpose in all future Unicode versions Assignments to Private Use Area characters need not be private in the sense of strictly internal to an organisation a number of assignment schemes have been published by several organisations Such publication may include a font that supports the definition showing the glyphs and software making use of the private use characters e g a graphics character for a print document function By definition multiple private parties may assign different characters to the same code point with the consequence that a user may see one private character from an installed font where a different one was intended Contents 1 Definition 2 Assignment 2 1 Unicode PUA blocks 2 2 History 3 Usage 3 1 Standardization initiative uses 3 2 Vendor use 4 Private use characters in other character sets 5 Notes 6 ReferencesDefinition editUnder the Unicode definition code points in the Private Use Areas are assigned characters they are not noncharacters reserved or unassigned Their category is Other private use Co and no character names are specified No representative glyphs are provided and character semantics are left to private agreement Private use characters are assigned Unicode code points whose interpretation is not specified by this standard and whose use may be determined by private agreement among cooperating users These characters are designated for private use and do not have defined interpretable semantics except by private agreement No charts are provided for private use characters as any such characters are by their very nature defined only outside the context of this standard 3 Assignment editIn the Basic Multilingual Plane plane 0 the block titled Private Use Area has 6400 code points Planes 15 and 16 are almost note 1 entirely assigned to two further Private Use Areas Supplementary Private Use Area A and Supplementary Private Use Area B respectively In UTF 16 a subset of the high surrogates U DB80 U DBFF is used for these and only these planes and are called High Private Use Surrogates vteUnicode Private Use AreasDefinition by character property General Category Co a b Range Plane Block name Number of code points NoteU E000 U F8FF BMP 0 Private Use Area 6 400U F0000 U FFFFD c PUP 15 d Supplementary Private Use Area A 65 534 UTF 16 encodes these characters using codepoints from the block High Private Use Surrogates U DB80 U DBFF in the BMP U 100000 U 10FFFD c PUP 16 d Supplementary Private Use Area B 65 534Notes Unicode 15 1 Data The Unicode Standard Section 23 5 Private Use Characters Code points U FFFFE U FFFFF U 10FFFE and U 10FFFF are noncharacters not private use characters Private Use Plane Unicode has not published identifying names for planes 15 and 16 Chapter 2 8 says The two Private Use Planes Planes 15 and 16 while the PUA block names used are Supplementary PUA A and Supplementary PUA B Unicode PUA blocks edit There are three PUA blocks in Unicode 3 Private Use AreaRangeU E000 U F8FF 6 400 code points PlaneBMPScriptsUnknownAssigned6 400 code pointsUnused0 reserved code pointsUnicode version history1 0 0 1991 5 632 5 632 1 0 1 1992 6 400 768 Unicode documentationCode chart Web pageNote Version 1 0 1 moved and expanded the Private Use Area block previously located at U E800 U FDFF in version 1 0 0 4 5 6 Supplementary Private Use Area ARangeU F0000 U FFFFF 65 536 code points PlaneSPUA AScriptsUnknownAssigned65 534 code pointsUnused0 reserved code points 2 non charactersUnicode version history2 0 1996 65 534 65 534 ChartCode chartNote 5 6 Supplementary Private Use Area BRangeU 100000 U 10FFFF 65 536 code points PlaneSPUA BScriptsUnknownAssigned65 534 code pointsUnused0 reserved code points 2 non charactersUnicode version history2 0 1996 65 534 65 534 ChartCode chartNote 5 6 History edit In Unicode 1 0 0 the private use area extended from U E800 to U FDFF i e did not include U E000 E7FF but additionally included the U F900 FDFF range now occupied by CJK Compatibility Ideographs Alphabetic Presentation Forms and Arabic Presentation Forms A 7 This was changed to U E000 F8FF in Unicode 1 0 1 4 and remained so in Unicode 1 1 8 Contrary to misconception the range U D800 DFFF reserved for UTF 16 surrogates since Unicode 2 0 was not included in the private use range of any Unicode 1 x version Historically planes E0 224 through FF 255 and groups 60 96 though 7F 127 of the Universal Coded Character Set i e U E00000 through U FFFFFF and U 60000000 through U 7FFFFFFF were also designated as private use These ranges were removed from the specified private use ranges when the UCS was restricted to the seventeen planes reachable in UTF 16 9 Usage editStandardization initiative uses edit Many people and institutions have created character collections for the PUA Some of these private use agreements are published so other PUA implementers can aim for unused or less used code points to prevent overlaps Several characters and scripts previously encoded in private use agreements have actually been fully encoded in Unicode necessitating mappings from the PUA to other Unicode code points One of the more well known and broadly implemented PUA agreements is maintained by the ConScript Unicode Registry CSUR The CSUR which is not officially endorsed or associated with the Unicode Consortium provides a mapping for constructed scripts such as Klingon pIqaD and Ferengi script Star Trek Tengwar and Cirth J R R Tolkien s cursive and runic scripts Alexander Melville Bell s Visible Speech and Dr Seuss alphabet from On Beyond Zebra The CSUR previously encoded the undeciphered Phaistos characters as well as the Shavian and Deseret alphabets which have all been accepted for official encoding in Unicode Another common PUA agreement is maintained by the Medieval Unicode Font Initiative MUFI This project is attempting to support all of the scribal abbreviations ligatures precomposed characters symbols and alternate letterforms found in medieval texts written in the Latin alphabet The express purpose of MUFI is to experimentally determine which characters are necessary to represent these texts and to have those characters officially encoded in Unicode As of Unicode version 5 1 152 MUFI characters have been incorporated into the official Unicode encoding needs update Some agreed upon PUA character collections exist in part or whole because the Unicode Consortium is in no hurry to encode them Some such as unrepresented languages are likely to end up encoded in the future Some unusual cases such as fictional languages are outside the usual scope of Unicode but not explicitly ruled out by the principles of Unicode and may show up eventually such as the Star Trek and Tolkien writing systems In other cases the proposed encoding violates one or more Unicode principles and hence is unlikely to ever be officially recognized by Unicode mostly where users want to directly encode alternate forms ligatures or base character plus diacritic combinations such as the TUNE scheme Publishing organisation Topic PUA area used FontCSUR Artificial and some ancient medieval scripts PUA BMP and Plane 15 Code2000MUFI Medieval scripts PUA BMP severalSIL Phonetics and languages PUA BMP Charis SILTITUS Ancient and medieval scripts PUA BMP TITUS Cyberbit BasicEmoji is an encoding for picture characters or emoticons used in Japanese wireless messages and webpages With Unicode 6 0 and later many of these have been encoded in the block Miscellaneous Symbols And Pictographs and elsewhere in the SMP GB T 20542 2006 Tibetan Coded Character Set Extension A and GB T 22238 2008 Tibetan Coded Character Set Extension B are Chinese national standards that use the PUA to encode precomposed Tibetan ligatures GB 18030 and GBK use the PUA to provisionally encode characters not found in Unicode standards at the time of publication most have been encoded since then The Institute of the Estonian Language uses the PUA to encode Latin and Cyrillic precomposed characters 10 that have no Unicode encoding The Free Tengwar Font Project uses a different mapping from the ConScript Unicode Registry that largely follows Michael Everson s 2001 03 07 Tengwar discussion paper but diverges in some details The MARC 21 standard uses the PUA to encode East Asian characters present in MARC 8 11 that have no Unicode encoding The SIL Corporate PUA uses the PUA to encode characters used in minority languages that have not yet been accepted into Unicode The STIX Fonts project uses the PUA to provide a comprehensive font set of mathematical symbols and alphabets many of which are also available in the SMP now e g in the Mathematical Alphanumeric Symbols block The Tamil Unicode New Encoding TUNE 12 is a proposed scheme for encoding Tamil that overcomes perceived deficiencies in the current Unicode encoding Vendor use edit Informally the range U F000 through U F8FF is known as the Corporate Use Area This originates from early versions of Unicode which defined an End User Zone extending from U E000 upward and a Corporate Use Zone extending from U F8FF downward with the boundary between the two left undefined 8 The Adobe Glyph List used to use the PUA for some of its glyphs 13 Apple lists a range of 1 280 characters in its developer documentation 14 from U F400 U F8FF within the PUA for Apple s use Of those only 311 are used in the range U F700 U F8FF NeXT NeXTSTEP and OPENSTEP and Apple Mac OS X AppKit 15 One of these is U F8FF the Apple logo generally supported by Apple s 8 bit sets WGL4 uses the PUA U F001 and U F002 to encode duplicates of the ligatures fi U FB01 fl U FB02 16 Microsoft s defunct Services For Macintosh feature used U F001 through U F029 as replacements for special characters allowed in HFS but forbidden in NTFS and U F02A for the Apple logo 17 18 In old versions of its RichEdit component Microsoft mapped U F020 U F0FF within the PUA to symbol fonts For any character in this range RichEdit would show a character from a symbol font instead of the end user defined character EUDC 19 20 AutoCAD clarification needed uses U F8FC U F8FE for diameter sign plus minus sign and degree sign respectively Some fonts place Windows logo key at U F000 Number U F000 is a numeral succession starting at 13 or 18 in some video games like Agar io On Ubuntu U E0FF is displayed as the Circle Of Friends logo 21 and U F200 is ubuntu in the Ubuntu typeface with a superscripted Circle Of Friends this itself is U F0FF 22 The 3270 font includes the Debian logo at U F100 In the Linux Libertine font U E000 displays Tux the mascot of Linux The Font Awesome icon font uses the PUA to display various glyphs Powerline a status line plugin for Vim uses U E0A0 U E0A2 and U E0B0 U E0B3 for extra box drawing characters 23 24 25 On the Fira Sans typeface used in Firefox OS U E003 is displayed as the Mozilla logo the dinosaur head Lotus Multi Byte Character Set LMBCS the encoding and character set internally used by Lotus IBM Lotus 1 2 3 Symphony SmartSuite Notes Domino as well as a number of third party products such as Microsoft Works uses some characters U F862 U F89F and U F8FB U F8FE in the Private Use Area for symbols not defined in Unicode Of these U F8FB is known to be reserved for a crown currency symbol Kr and U F8FC and U F8FD were later mapped to U FB02 fl and U FB01 fi respectively Additionally when UTF 16 codes are embedded in LMBCS the UTF 16 codes corresponding to U F601 through U F6FF are substituted for UTF 16 codes which would contain null bytes since LMBCS is designed to not contain embedded null bytes 26 27 IBM reserved several code page IDs for PUA code pages code page 1446 for the generic plane 15 code page 1447 for the generic plane 16 code page 1448 for the generic BMP PUA code page 1445 IBM AFP PUA No 1 for plane 15 with IBM allocations in U FFF00 U FFFFD 28 29 and code page 1449 IBM default PUA for the BMP PUA with IBM allocations in U F83D U F8FF 30 31 The file system found in Windows uses the U F000 to U F0FF block to escape special characters NetApp translates characters in filenames that are allowed on Unix but invalid for SMB clients to PUA characters 32 Twitter s Chirp font provides some additional icons like U E000 which corresponds to a left down arrow U EA00 which corresponds to the Twitter bird and U F8FF which corresponds to an Apple logo possibly for compatibility with Apple fonts 33 Private use characters in other character sets editThe concept of reserving specific code points for Private Use is based on similar earlier usage in other character sets In particular many otherwise obsolete characters in East Asian scripts continue to be used in specific names or other situations and so some character sets for those scripts made allowance for private use characters such as the user defined planes of CNS 11643 or gaiji in certain Japanese encodings The Unicode standard references these uses under the name End User Character Definition EUCD 3 Additionally the C1 control block contains two codes intended for private use control functions by ECMA 48 0x91 private use one PU1 and 0x92 private use two PU2 34 35 Unicode includes these at U 0091 lt control 0091 gt and U 0092 lt control 0092 gt but defines them as control characters category Cc not private use characters category Co 5 36 Encodings which do not have private use areas but have more or less unused areas such as ISO IEC 8859 and Shift JIS have seen uncontrolled variants of these encodings evolve 37 For Unicode software companies can use the Private Use Areas for their desired additions Notes edit The last two characters of every plane are defined to be noncharacters The remaining 65 534 characters of each of planes 15 and 16 are assigned as private use characters References edit Glossary of Unicode Terms Private Use Area PUA Unicode Consortium Unicode Character Encoding Stability Policy 2021 11 10 Retrieved 2022 03 03 a b c Chapter 23 Special Areas and Format Characters PDF The Unicode Standard Version 14 0 Core Specification Private Use characters a b Unicode 1 0 1 PDF The Unicode Standard 1992 11 03 Archived PDF from the original on 2016 07 02 Retrieved 2016 07 09 a b c d Unicode character database The Unicode Standard Retrieved 2023 07 26 a b c Enumerated Versions of The Unicode Standard The Unicode Standard Retrieved 2023 07 26 3 5 Private Use Area PDF The Unicode Standard Version 1 0 Volume 1 Unicode Consortium 1991 pp 118 119 ISBN 0 201 56788 1 Archived PDF from the original on 2021 10 21 Retrieved 2021 10 11 a b 2 0 Changes in Unicode 1 0 PDF The Unicode Standard Version 1 1 Unicode Consortium pp 3 4 UTR 4 Archived PDF from the original on 2021 11 20 Retrieved 2021 10 11 Whistler Ken 2000 Necessary changes for ISO IEC 10646 regarding the PUA UTC 00 015 Archived from the original on 2021 06 23 Retrieved 2021 01 30 Letter Database Eki ee Archived from the original on 2018 05 21 Retrieved 2013 04 11 Character Sets East Asian Characters Alternative Unicode Mappings for MARC 21 Characters Assigned to the Private Use Area PUA MARC 21 Specifications for Record Structure Character Sets and Exchange Media Library of Congress 2004 09 02 Archived from the original on 2013 08 19 Retrieved 2013 04 11 tunerfc tn nic in tunerfc tn nic in Archived from the original on 2010 07 29 Retrieved 2013 04 11 Unicode Corporate Use Subarea as used by Adobe Systems October 22 1998 Archived from the original on October 9 2002 Retrieved May 12 2021 NSOpenStepUnicodeReservedBase Apple Developer Documentation Apple Inc Archived from the original on 2020 11 06 Retrieved 2020 10 16 Apple Computer Inc 2005 1994 CORPCHAR TXT Registry external version of Apple use of Unicode corporate zone characters c03 Unicode Inc Archived from the original on 2020 10 30 Retrieved 2020 10 16 WGL4 Unicode Range U 2013 through U FB02 Microsoft Archived from the original on 2014 07 17 SFM Converts Macintosh HFS Filenames to NTFS Unicode Microsoft Support February 24 2014 Archived from the original on May 27 2016 ntfs util c 2008 Archived from the original on 2018 08 07 Retrieved 2018 08 07 Invalid NTFS filename characters are encodeded sic using the SFM Services for Macintosh private use Unicode characters The range of characters between U F020 and U F0FF in the Private Use Area of Unicode is mapped to symbol fonts in Richedit 4 1 Microsoft Knowledge Base Archived from the original on 2012 10 22 Handling of PUA Characters in Microsoft Software SIL International 2003 04 25 Archived from the original on 2015 05 11 Retrieved 2014 03 04 Comment 8 Bug 651606 circle of friends Bugs Ubuntu Font Family Launchpad 5 October 2010 Archived from the original on 2020 10 17 Retrieved 2020 10 17 Comment 2 Bug 853855 Bugs Ubuntu Font Family Launchpad 26 September 2011 Archived from the original on 2020 10 17 Retrieved 2020 10 17 Powerline status line plugin question on Stack Exchange mentioning private use area characters Archived from the original on 2015 03 12 Retrieved 2015 03 22 Pictures showing private use area characters in Powerline patched fonts Archived from the original on 2015 05 11 Retrieved 2015 03 22 Li Renzhi 2019 08 23 Proposal to add additional characters into the Graphics for Legacy Computing block of the UCS PDF Retrieved 2023 07 31 lmb excp ucm GitHub 2000 02 10 Archived from the original on 2022 01 25 Retrieved 2020 04 23 Anhang 2 Der Lotus Multibyte Zeichensatz LMBCS Appendix 2 The Lotus Multibyte Character Set LMBCS Lotus 1 2 3 Version 3 1 Referenzhandbuch Lotus 1 2 3 Version 3 1 Reference Manual in German 1 ed Cambridge Massachusetts US Lotus Development Corporation 1989 pp A2 1 A2 13 302168 CPGID 01445 chart PDF REGISTRY Graphic Character Sets and Code Pages 2012 2011 C H 3 3220 050 The area shown in the chart above represents only 254 bytes of row FF in plane 0F CPGID 01445 IBM AFP PUA No 1 REGISTRY Graphic Character Sets and Code Pages 2012 2011 C H 3 3220 050 The area shown in the chart above represents only 254 bytes of row FF in plane 0F CPGID 01449 IBM default PUA IBM Globalization Code page identifiers IBM Archived from the original on 2015 09 16 IBM has designated 195 positions from U F83D to U F8FF for use as IBM Corporate zone and intends to use them consistently within IBM whenever there is a need to maintain the round trip integrity of IBM characters IBM 1997 unicode nam Allow the Unicode characters to be specified using either the IBM or PostScript like names Included with Borgendale Ken OS 2 Codepage and Keyboard Display Tools Configure character mapping for SMB file name translation on volumes 9 December 2021 Retrieved 2022 10 14 Twitter Chirp Font Copy Paste Dump Retrieved 2022 02 08 Standard ECMA 48 Fifth Edition June 1991 PDF 8 2 14 Miscellaneous control functions 8 3 100 8 3 101 ISO TC97 SC2 1983 10 01 C1 Control Character Set of ISO 6429 PDF ITSCJ IPSJ ISO IR 77 a href Template Citation html title Template Citation citation a CS1 maint numeric names authors list link Chapter 4 Character Properties PDF The Unicode Standard Version 14 0 Core Specification Table 4 4 Map external version from Mac OS Japanese encoding to Unicode 2 1 and later Archived from the original on 2021 08 31 Retrieved 2021 10 08 Retrieved from https en wikipedia org w index php title Private Use Areas amp oldid 1187031923, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.