fbpx
Wikipedia

International Components for Unicode

International Components for Unicode (ICU) is an open-source project of mature C/C++ and Java libraries for Unicode support, software internationalization, and software globalization. ICU is widely portable to many operating systems and environments. It gives applications the same results on all platforms and between C, C++, and Java software. The ICU project is a technical committee of the Unicode Consortium and sponsored, supported, and used by IBM and many other companies.[1]

Developer(s)Unicode Consortium
Initial release1999
Stable release
73.2 / 15 June 2023; 3 months ago (2023-06-15)
Repository
  • github.com/unicode-org/icu
Written inC/C++ (C++20; using ICU requires at least C++11) and Java 8+
Operating systemCross-platform
TypeLibraries for Unicode and internationalization
LicenseUnicode License
Websiteicu.unicode.org

ICU provides the following services: Unicode text handling, full character properties, and character set conversions; Unicode regular expressions; full Unicode sets; character, word, and line boundaries; language-sensitive collation and searching; normalization, upper and lowercase conversion, and script transliterations; comprehensive locale data and resource bundle architecture via the Common Locale Data Repository (CLDR); multiple calendars and time zones; and rule-based formatting and parsing of dates, times, numbers, currencies, and messages. ICU provided complex text layout service for Arabic, Hebrew, Indic, and Thai historically, but that was deprecated in version 54, and was completely removed in version 58 in favor of HarfBuzz.[2]

ICU provides more extensive internationalization facilities than the standard libraries for C and C++. ICU 72 updates to the latest Unicode 15. "In many formatting patterns, ASCII spaces are replaced with Unicode spaces (e.g., a "thin space")." ICU (ICU4J) now requires Java 8 but "Most of the ICU 72 library code should still work with Java 7 / Android API level 21, but we no longer test with Java 7."[3] ICU 71 added e.g. phrase-based line breaking for Japanese (earlier methods didn't work well for short Japanese text, such as in titles and headings) and support for Hindi written in Latin letters (hi_Latn), also referred to as "Hinglish" and updates to the time zone data version 2022a. ICU 70 added e.g. support for emoji properties of strings and can now be built and used with C++20 compilers (and "ICU operator==() and operator!=() functions now return bool instead of UBool, as an adjustment for incompatible changes in C++20"),[4] and as of that version the minimum Windows version is Windows 7. ICU 67 supports Unicode 13.0 and handles removal of Great Britain from the EU. ICU 64 supports Unicode 12.0, while ICU 64.2 added support for Unicode 12.1, i.e. the single new symbol for current Japanese Reiwa era (but support for it has also been backported to older ICU versions down to ICU 4.8.2). ICU 58 (with Unicode 9.0 support) is the last version to support older platforms such as Windows XP and Windows Vista. Support for AIX, Solaris and z/OS may also be limited in later versions (i.e. building depends on compiler support).[5] ICU has been included as a standard component with Microsoft Windows since Windows 10 version 1703.[6]

ICU has historically used UTF-16, and still does only for Java; while for C/C++ UTF-8 is supported,[7][8] including the correct handling of "illegal UTF-8".[9]

ICU 73.2 has improved significant changes for GB18030-2022 compliance support, i.e. for Chinese (that updated GB18030 standard is slightly incompatible); has "a modified character conversion table, mapping some GB18030 characters to Unicode characters that were encoded after GB18030-2005" and has a number of other changes such as improving Japanese and Korean short-text line breaking, and in "English, the name “Türkiye” is now used for the country instead of “Turkey” (the alternate spelling is also available in the data)."[10]

Future ICU 74 planned for October 2023 will require C++17 (up from C++11) or C11 (up from C99), depending on what languages is used.

Origin and development Edit

After Taligent became part of IBM in early 1996, Sun Microsystems decided that the new Java language should have better support for internationalization. Since Taligent had experience with such technologies and were close geographically, their Text and International group were asked to contribute the international classes to the Java Development Kit as part of the JDK 1.1 internationalization APIs.[11] A large portion of this code still exists in the java.text and java.util packages. Further internationalization features were added with each later release of Java.

The Java internationalization classes were then ported to C++ and C[12] as part of a library known as ICU4C ("ICU for C"). The ICU project also provides ICU4J ("ICU for Java"), which adds features not present in the standard Java libraries. ICU4C and ICU4J are very similar, though not identical; for example, ICU4C includes a Regular Expression API, while ICU4J does not. Both frameworks have been enhanced over time to support new facilities and new features of Unicode and Common Locale Data Repository (CLDR).

ICU was released as an open-source project in 1999 under the name IBM Classes for Unicode. It was later renamed to International Components For Unicode.[13] In May 2016, the ICU project joined the Unicode consortium as technical committee ICU-TC, and the library sources are now distributed under the Unicode license.[14]

MessageFormat Edit

A part of ICU is the MessageFormat class, a formatting system that allows for any number of arguments to control the plural form (plural, selectordinal) or more general switch-case-style selection (select) for things like grammatical gender. These statements can be nested.[15] ICU MessageFormat was created by adding the plural and selection system to an identically-named system in Java SE.

Alternatives Edit

An alternative for using ICU with C++, or to using it directly, is to use Boost.Locale, which is a C++ wrapper for ICU (while also allowing other backends[16]). The claim for using it rather than ICU directly is that "is absolutely unfriendly to C++ developers. It ignores popular C++ idioms (the STL, RTTI, exceptions, etc), instead mostly mimicking the Java API."[17][18] Another claim, that ICU only supports UTF-16 (and thus a reason to avoid using ICU) is no longer true with ICU now also supporting UTF-8 for C and C++.[7]

See also Edit

References Edit

  1. ^ . site.icu-project.org. Archived from the original on 2021-08-27. Retrieved 2011-11-14.
  2. ^ "Layout Engine - ICU User Guide". userguide.icu-project.org.
  3. ^ "ICU - International Components for Unicode - ICU 72". icu.unicode.org. Retrieved 2023-01-24.
  4. ^ "ICU - International Components for Unicode - ICU 70". icu.unicode.org. Retrieved 2023-01-24.
  5. ^ "Download ICU 64 - ICU - International Components for Unicode". site.icu-project.org. Retrieved 2019-10-20.
  6. ^ Chen, Raymond (27 May 2021). "How can I convert between IANA time zones and Windows registry-based time zones?". The Old New Thing. Microsoft.
  7. ^ a b "UTF-8". ICU Documentation. Retrieved 2022-05-24.
  8. ^ "UTF-8 - ICU User Guide". userguide.icu-project.org. Retrieved 2018-04-03.
  9. ^ "#13311 (change illegal-UTF-8 handling to Unicode "best practice")". bugs.icu-project.org. Retrieved 2018-04-03.
  10. ^ "ICU - International Components for Unicode - ICU 73". icu.unicode.org. Retrieved 2023-09-24.
  11. ^ Laura Werner (1999). . Archived from the original on 2021-11-17. Retrieved 2007-05-23.
  12. ^ "ICU User Guide". userguide.icu-project.org.
  13. ^ . Archived from the original on 2021-08-28. Retrieved 2012-08-17.
  14. ^ "ICU joins the Unicode Consortium". Unicode, Inc. 2016-05-16. Retrieved 2016-08-01.
  15. ^ "Formatting Messages". ICU User Guide.
  16. ^ "Boost.Locale: Using Localization Backends". www.boost.org. Retrieved 2022-05-24.
  17. ^ "Boost.Locale: Design Rationale". www.boost.org. Retrieved 2022-05-24.
  18. ^ "ICU vs Boost Locale in C++". Stack Overflow. Retrieved 2022-05-24.

External links Edit

  • Official website  
  • International Components for Unicode transliteration services
  • Online ICU editor

international, components, unicode, open, source, project, mature, java, libraries, unicode, support, software, internationalization, software, globalization, widely, portable, many, operating, systems, environments, gives, applications, same, results, platfor. International Components for Unicode ICU is an open source project of mature C C and Java libraries for Unicode support software internationalization and software globalization ICU is widely portable to many operating systems and environments It gives applications the same results on all platforms and between C C and Java software The ICU project is a technical committee of the Unicode Consortium and sponsored supported and used by IBM and many other companies 1 Developer s Unicode ConsortiumInitial release1999Stable release73 2 15 June 2023 3 months ago 2023 06 15 Repositorygithub wbr com wbr unicode org wbr icuWritten inC C C 20 using ICU requires at least C 11 and Java 8 Operating systemCross platformTypeLibraries for Unicode and internationalizationLicenseUnicode LicenseWebsiteicu wbr unicode wbr orgICU provides the following services Unicode text handling full character properties and character set conversions Unicode regular expressions full Unicode sets character word and line boundaries language sensitive collation and searching normalization upper and lowercase conversion and script transliterations comprehensive locale data and resource bundle architecture via the Common Locale Data Repository CLDR multiple calendars and time zones and rule based formatting and parsing of dates times numbers currencies and messages ICU provided complex text layout service for Arabic Hebrew Indic and Thai historically but that was deprecated in version 54 and was completely removed in version 58 in favor of HarfBuzz 2 ICU provides more extensive internationalization facilities than the standard libraries for C and C ICU 72 updates to the latest Unicode 15 In many formatting patterns ASCII spaces are replaced with Unicode spaces e g a thin space ICU ICU4J now requires Java 8 but Most of the ICU 72 library code should still work with Java 7 Android API level 21 but we no longer test with Java 7 3 ICU 71 added e g phrase based line breaking for Japanese earlier methods didn t work well for short Japanese text such as in titles and headings and support for Hindi written in Latin letters hi Latn also referred to as Hinglish and updates to the time zone data version 2022a ICU 70 added e g support for emoji properties of strings and can now be built and used with C 20 compilers and ICU operator and operator functions now return bool instead of UBool as an adjustment for incompatible changes in C 20 4 and as of that version the minimum Windows version is Windows 7 ICU 67 supports Unicode 13 0 and handles removal of Great Britain from the EU ICU 64 supports Unicode 12 0 while ICU 64 2 added support for Unicode 12 1 i e the single new symbol for current Japanese Reiwa era but support for it has also been backported to older ICU versions down to ICU 4 8 2 ICU 58 with Unicode 9 0 support is the last version to support older platforms such as Windows XP and Windows Vista Support for AIX Solaris and z OS may also be limited in later versions i e building depends on compiler support 5 ICU has been included as a standard component with Microsoft Windows since Windows 10 version 1703 6 ICU has historically used UTF 16 and still does only for Java while for C C UTF 8 is supported 7 8 including the correct handling of illegal UTF 8 9 ICU 73 2 has improved significant changes for GB18030 2022 compliance support i e for Chinese that updated GB18030 standard is slightly incompatible has a modified character conversion table mapping some GB18030 characters to Unicode characters that were encoded after GB18030 2005 and has a number of other changes such as improving Japanese and Korean short text line breaking and in English the name Turkiye is now used for the country instead of Turkey the alternate spelling is also available in the data 10 Future ICU 74 planned for October 2023 will require C 17 up from C 11 or C11 up from C99 depending on what languages is used Contents 1 Origin and development 2 MessageFormat 3 Alternatives 4 See also 5 References 6 External linksOrigin and development EditAfter Taligent became part of IBM in early 1996 Sun Microsystems decided that the new Java language should have better support for internationalization Since Taligent had experience with such technologies and were close geographically their Text and International group were asked to contribute the international classes to the Java Development Kit as part of the JDK 1 1 internationalization APIs 11 A large portion of this code still exists in the java text and java util packages Further internationalization features were added with each later release of Java The Java internationalization classes were then ported to C and C 12 as part of a library known as ICU4C ICU for C The ICU project also provides ICU4J ICU for Java which adds features not present in the standard Java libraries ICU4C and ICU4J are very similar though not identical for example ICU4C includes a Regular Expression API while ICU4J does not Both frameworks have been enhanced over time to support new facilities and new features of Unicode and Common Locale Data Repository CLDR ICU was released as an open source project in 1999 under the name IBM Classes for Unicode It was later renamed to International Components For Unicode 13 In May 2016 the ICU project joined the Unicode consortium as technical committee ICU TC and the library sources are now distributed under the Unicode license 14 MessageFormat EditA part of ICU is the MessageFormat class a formatting system that allows for any number of arguments to control the plural form plural selectordinal or more general switch case style selection select for things like grammatical gender These statements can be nested 15 ICU MessageFormat was created by adding the plural and selection system to an identically named system in Java SE Alternatives EditAn alternative for using ICU with C or to using it directly is to use Boost Locale which is a C wrapper for ICU while also allowing other backends 16 The claim for using it rather than ICU directly is that is absolutely unfriendly to C developers It ignores popular C idioms the STL RTTI exceptions etc instead mostly mimicking the Java API 17 18 Another claim that ICU only supports UTF 16 and thus a reason to avoid using ICU is no longer true with ICU now also supporting UTF 8 for C and C 7 See also EditApple Advanced Typography Apple Type Services for Unicode Imaging GNU GetText Graphite SIL NetRexx ICU license OpenType Pango Uconv UniscribeReferences Edit ICU International Components for Unicode site icu project org Archived from the original on 2021 08 27 Retrieved 2011 11 14 Layout Engine ICU User Guide userguide icu project org ICU International Components for Unicode ICU 72 icu unicode org Retrieved 2023 01 24 ICU International Components for Unicode ICU 70 icu unicode org Retrieved 2023 01 24 Download ICU 64 ICU International Components for Unicode site icu project org Retrieved 2019 10 20 Chen Raymond 27 May 2021 How can I convert between IANA time zones and Windows registry based time zones The Old New Thing Microsoft a b UTF 8 ICU Documentation Retrieved 2022 05 24 UTF 8 ICU User Guide userguide icu project org Retrieved 2018 04 03 13311 change illegal UTF 8 handling to Unicode best practice bugs icu project org Retrieved 2018 04 03 ICU International Components for Unicode ICU 73 icu unicode org Retrieved 2023 09 24 Laura Werner 1999 Getting Java ready for the world A brief history of IBM and Sun s internationalization efforts Archived from the original on 2021 11 17 Retrieved 2007 05 23 ICU User Guide userguide icu project org ICU Project Management Committee Archived from the original on 2021 08 28 Retrieved 2012 08 17 ICU joins the Unicode Consortium Unicode Inc 2016 05 16 Retrieved 2016 08 01 Formatting Messages ICU User Guide Boost Locale Using Localization Backends www boost org Retrieved 2022 05 24 Boost Locale Design Rationale www boost org Retrieved 2022 05 24 ICU vs Boost Locale in C Stack Overflow Retrieved 2022 05 24 External links EditOfficial website nbsp International Components for Unicode transliteration services Online ICU editor Retrieved from https en wikipedia org w index php title International Components for Unicode amp oldid 1177032054, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.