fbpx
Wikipedia

Tamil All Character Encoding

Tamil All Character Encoding (TACE16) is a 16-bit Unicode-based character encoding scheme for Tamil language.[1][2]

This encoding isn't used on the web, some encodings have been used for Tamil, but Unicode, i.e. UTF-8 has 100.0% use on the web.

Keyboard drivers and fonts

The Keyboard driver for this encoding scheme are available in Tamil Virtual University website[3] for free.[4] It uses Tamil99 and Tamil Typewriter keyboard layouts, which are approved by Tamil Nadu Government, and maps the input keystrokes to its corresponding characters of TACE16 scheme.[2] To read the files which are created using TACE16 scheme, the corresponding Unicode Tamil fonts for this encoding scheme are also available in the same website.[3][4] These fonts not only have mapping of glyphs for characters of TACE16 format, but also for the present Unicode encoding for both ASCII and Tamil characters, so that they can provide backward compatibility for reading existing files which are created using present Unicode encoding scheme for Tamil language.

Character set

All characters of this encoding scheme are located in the private use area of the Basic Multilingual Plane of Unicode's Universal Character Set.

Tamil All Character Encoding (TACE16) Character Set
Consonants→
Vowels
E10 E18 E1A E1F E20 E21 E22 E23 E24 E25 E26 E27 E28 E29 E2A E2B E2C E2D E2E E2F E30 E31 E32 E33 E34 E35 E36 E37 E38 E39 E3A E3B E3C E3D E3E E3F
0 அரைக்கால் க் ங் ச் ஞ் ட் ண் த் ந் ப் ம் ய் ர் ல் வ் ழ் ள் ற் ன்
1 கால்
2 அரை கா ஙா சா ஞா டா ணா தா நா பா மா யா ரா லா வா ழா ளா றா னா
3 முக்கால் ி கி ஙி சி ஞி டி ணி தி நி பி மி யி ரி லி வி ழி ளி றி னி
4 அரைவீசம் கீ ஙீ சீ ஞீ டீ ணீ தீ நீ பீ மீ யீ ரீ லீ வீ ழீ ளீ றீ னீ
5 வீசம் கு ஙு சு ஞு டு ணு து நு பு மு யு ரு லு வு ழு ளு று னு
6 மூவீசம் கூ ஙூ சூ ஞூ டூ ணூ தூ நூ பூ மூ யூ ரூ லூ வூ ழூ ளூ றூ னூ
7 அரைமா கெ ஙெ செ ஞெ டெ ணெ தெ நெ பெ மெ யெ ரெ லெ வெ ழெ ளெ றெ னெ
8 பௌர்ணமி ஒருமா கே ஙே சே ஞே டே ணே தே நே பே மே யே ரே லே வே ழே ளே றே னே
9 அமாவாசை இரண்டுமா கை ஙை சை ஞை டை ணை தை நை பை மை யை ரை லை வை ழை ளை றை னை
A கார்த்திகை மும்மா கொ ஙொ சொ ஞொ டொ ணொ தொ நொ பொ மொ யொ ரொ லொ வொ ழொ ளொ றொ னொ
B ராஜ நாலுமா கோ ஙோ சோ ஞோ டோ ணோ தோ நோ போ மோ யோ ரோ லோ வோ ழோ ளோ றோ னோ
C முந்திரி கௌ ஙௌ சௌ ஞௌ டௌ ணௌ தௌ நௌ பௌ மௌ யௌ ரௌ லௌ வௌ ழௌ ளௌ றௌ னௌ
D அரைக்காணி
E காணி
F முக்காணி
Note:
Newly added. Not present in Unicode_v6.3.
Allocated for researches(NLP)
For future use

Analysis of TACE16 over present Unicode standard for Tamil language

Analysis of TACE16 over present Unicode standard for Tamil language:

Issues with the present Unicode for Tamil language

The present Unicode standard for Tamil is considered not adequate for efficient and effective usage of Tamil in computers, due to the following reasons:[1]

  1. Unicode code Tamil has code positions only for 31 out of 247 Tamil Characters. These 31 characters include 12 vowels, 18 agara-uyirmey, one aytham, not including five Grantha agara-uyirmey which are also provided code space in Unicode Tamil. The other Tamil Characters have to be rendered using a separate software. Only 10% of the Tamil Characters are provided code space in the Present Unicode Tamil. 90% of the Tamil Characters that are used in general text interchange are not provided code space.
  2. The Uyir-meys that are left out in the present Unicode Tamil are simple characters, just like A, B, C, D are characters to English. Uyir-meys are not glyphs, nor ligatures, nor conjunct characters as assumed in Unicode. ka, kA, ki, kI, etc., are characters to Tamil.
  3. In any plain Tamil text, Vowel Consonants (uyir-meys) form 64 to 70%; Vowels (uyir) form 5 to 6% and Consonants (meys) form 25 to 30%. Breaking high frequency letters like vowel-consonants into glyphs is highly inefficient.
  4. This type of encoding, which requires a rendering engine to realize a character while computing, is not suitable for applications like system software developments in Tamil, searching and sorting and Natural language processing (NLP) in Tamil. It consumes extra time and space, making the computing process highly inefficient. For such applications Level-1 implementation where all the characters of a language have code positions in the encoding, like English, is required.
  5. This encoding is based on ISCII (1988) and therefore, the characters are not in the natural order of sequence. It requires a complex collation algorithm for arranging them in the natural order of sequence.
  6. It uses multiple code points to render single characters. Multiple code points lead to security vulnerabilities, ambiguous combinations and requires the use of normalization.
  7. Simple counting letters, sorting, searching are inefficient.
  8. It requires ZWJ/ZWNJ type hidden characters.
  9. It needs exception table to prevent illegal combinations of code points.
  10. Unicode Indic block is built on enormous, complex, error-prone edifice, based on an encoding that is NOT built to last.
  11. Very first code point says "Tamil Sign Anusvara - Not used in Tamil".
  12. Assumed collation was same as Devanagari - incorrectly uses ambiguous encoding to render same character.
  13. It encodes 23 Vowel-Consonants (23 consonants + Ü) and calls them as consonants, against Tamil grammar.
  14. Unnatural for Speech to Text/Text to Speech.
  15. Inefficient to store, transmit and retrieval (For example, file reading and writing, Internet, etc.).
  16. Complex processing hinders development.
  17. Need normalization for string comparison.
  18. A sequence of characters may correspond to a single glyph, that is, ச + ெ◌ + ◌ா = ெசா. Characters are not graphemes. According to Unicode ெசா is a grapheme; but ச, ெ◌, ◌ா are characters.
  19. Requires Dynamic Composition - a text element encoded as a sequence of a base character followed by one or more combining marks.
  20. There are two methods of rendering the Vowel Consonants. This leads to ambiguity in rendering characters.
  21. The present Unicode is not efficient for parsing. For example, the name திருவள்ளுவர் looks like it should have seven letters. However, according to Unicode, this name has twelve characters: த ◌ி ர ◌ு வ ள ◌் ள ◌ு வ ர ◌
  22. To properly count the letters in this name, an expert developer had to write a complex program and present it as a technical paper in a Tamil computing conference. To compare, counting letters in an English word is an exercise left to a beginning programmer. Such problems are triggered because a simple script such as Tamil is treated as a complex script by Unicode. For example in Python library open-tamil,[5] which uses present Unicode Standard for Tamil, in order to count the number of Tamil letters in the given text, the function tamil.utf8.get_letters is first used to parse the text into a List and then returns the length of the list as the count of the number of letters.[6] This type of complex programming logic or extra additional layer of framework requirement is needed when a simple script such as Tamil is treated as a complex script.
  23. The Unicode standard policy is to encode only characters, not glyphs. However,[7] because Unicode Tamil standard includes the vowel signs as combining characters. These signs that have no meaning to a Tamil reader would be displayed as is by character-shaping engines that detect a blank space between them and a base character. Thus Unicode introduces the dotted circle as a Tamil character.
  24. Unicode Tamil is not fully supported in many platforms primarily because Tamil is treated as a complex script that requires complex processing.
  25. Since all the above-mentioned inefficiencies consume more processing cycles of a processor for a machine than needed, it will increase the overall lifetime power usage (electricity) by a machine which processes Unicode Tamil. For example, when processing a single Tamil character kI (கீ), it has to process both consonant and vowel modifier, which doubles the consumption of processing cycles of a processor.

Analysis of TACE16 over Unicode Tamil

The following data provides the comparison of analysis of current Unicode encoding for Tamil language vs TACE16 on E-Governance and Browsing:[1]

  1. TACE16 is efficient over Unicode Tamil by about 5.46 to 11.94 percent in the case of Data Storage Application.
  2. TACE16 is efficient over Unicode Tamil by about 18.69 to 22.99 percent in the case of Sorting Index Data.
  3. TACE16 is efficient over Unicode Tamil by about 25.39% when the entire data is of Tamil. The default collation sequence followed (Binary) while using the code-space values in the New TACE16 is not as per Tamil dictionary order. Some of the uyir-meys (Agara-uyirmeys) are taking precedence over vowels and other Uyirmeys in the New TACE16, the vowels and agarauyir-meys being in the 0B80 - 0B8F block and the other Uyir-meys being in the 0800 to 08FF. Because of this reason, sorting Unicode data looks better than TACE16 data.
  4. TACE16 is faster in sorting over Unicode Tamil by about 0.31 to 16.96 percent.
  5. Index creation on TACE16 data is faster by 36.7% than Unicode.
  6. For full key search on indexed fields, TACE16 performed better than Unicode Tamil by up to 24.07%. In the case of non-indexed fields also TACE16 performed better than Unicode Tamil by up to 20.9%.
  7. Rendering of static Tamil Data was fine with TACE16.

Advantages of TACE16 over Unicode Tamil

The TACE16 character-encoding scheme not only overcomes all the issues with the present Unicode encoding standard for Tamil language which are mentioned above, but also provides major performance improvements in both processing time and processing space. This system has the following additional advantages:[1]

  1. The encoding is Universal since it encompasses all characters that are found in general Tamil text interchange.
  2. The Collation is sequential in accordance with the code value.
  3. The encoding is unambiguous.
  4. Any given code point always represents the same character.
  5. There is no ambiguity as in the present Unicode Tamil.

The Unicode Tamil encoding had many issues and there was a proposal to reencode Tamil.[8] This was rejected by Unicode, who said that the reencoding would be damaging and there was no convincing evidence Unicode Tamil encoding is bad.[9]

This system has the following advantages for computer programming:

  • Software to accommodate Tamil characters and their processing is simplified.
  • Sorting and searching are very simple.
  • For a machine, TACE16 takes fewer CPU cycles (and so uses less electricity) than Unicode Tamil.
  • TACE16 allows to do programming based on Tamil grammar, which is not very easy in Unicode Tamil (needs extra framework development).
  • The encoding is very efficient to parse. By simple arithmetic operation the characters can be parsed. In computer programming, second method is very efficient in terms of performance over large character set. Also, these methods follows the basic Tamil grammar that Consonant+Vowel=Vowel-Consonant(UyirMei) which is not followed in Unicode Tamil.
Method 1 (By simple arithmetic operations): க் + இ = கி E210 (க்) + E203 (இ) - E200 (Constant) = E213 (கி) Method 2: க் (E210) + இ (E203) = கி (E213) E210 (க்) | (E203 (இ) & 000F (Constant)) = E213 (கி) 
  • It is very efficient to divide a vowel-consonant (UyirMei) character into its corresponding vowel and consonant. This is very efficient in terms of performance over large data.
     /* To get Vowel */ E213 (ி) & 'F20F (Constant)' = E203 () /* To get Consonant */ E213 (ி) & 'FFF0 (Constant)' = E210 () 
  • It is very efficient to find whether a character is vowel or consonant or vowel-consonant (UyirMei) or numbers.
     /* | - Bitwise OR  * & - Bitwise AND  * ! - Bitwise NOT  * ^ - Bitwise XOR  * ||- Conditional OR  * &&- Conditional AND  */ c = the TACE16 encoding for a Tamil character /* To check whether a character is vowel */ /* Method 1 */ ((c >= E201) && (c <= E20C)) == true // => Vowel /* Method 2 - If code positions E200, E20E, E20F are not used for any other purpose*/ (((c & 'E20F (Constant)')==c) && (c != E20D)) == true // => Vowel ((!((c & 'E20F (Constant)')^c)) && (c != E20D)) == true // => Vowel /* To check whether a character is consonant or Vowel-consonant (UyirMei) */ x = (c & '000F (Constant)') // If c is Vowel or Vowel-Consonant, then x = Unique number for each vowel starting from 1 (((c >= E210) && (c <= E38C)) && (x == 0)) == true // => Consonant (((c >= E210) && (c <= E38C)) && ((x >= 1) && (x <= 12))) == true // => Vowel-Consonant(UyirMei) /* To check whether a character is Tamil number */ /* Method 1 */ ((c >= E180) && (c <= E18C)) == true // => Tamil Number /* Method 2*/ //If code positions E18D-E18F are not used for any other purpose (c & 'E18F (Constant)') == c // => Tamil Number (!((c & 'E18F (Constant)')^c)) == true // => Tamil Number //If code positions E18D-E18F are used for any other purpose, then either Method 1 or below method can be used*/ ((!((c & 'E18F (Constant)')^c)) && ((c & '000F (Constant)') <= 12)) == true // => Tamil Number 
  • It is very easy to convert numbers to Tamil numbers (new Tamil number format) and vice versa (same as Unicode Tamil).
     /* To convert a number to new format of Tamil number and vice versa, direct digit to digit conversion is enough. */ /* To convert a number to new format of Tamil number */ n = single digit number (0-9) /* Method 1 */ (n & 'E18F (Constant)') // => Tamil Number /* Method 2 */ (n | 'E180 (Constant)') // => Tamil Number /* To convert new format of Tamil number to a number */ c = single digit Tamil number character(-) (c & '000F (Constant)') // => Number 

Alternative claims

Open-Tamil

The open-tamil project[10] provides many of the common operations, e.g. to extract letters from Unicode UTF-8 encoded string, sorting, searching etc. Even though the project claims Level-1 compliance of Tamil text processing without using TACE16, the project is still written on top of extra programming logic which is needed for present Unicode Standard for Tamil.

#!/usr/bin/env python import codecs import tamil.utf8 as utf8 with codecs.open('singl', 'w', encoding='utf-8') as ff: letters = utf8.get_letters(u"கூவிளம் என்பது என்ன சீர்") for letter in letters: ff.write(letter) print(letter) ff.write(' ') ff.close() 

generates the output, output: கூ வி ள ம் எ ன் ப து எ ன் ன சீ ர்

See also

  • TSCII (Tamil Script Code for Information Interchange)
  • AnyTaFont2UTF8 – An Open source project for all Tamil Encoding/Font Mapping characters.

References

  1. ^ a b c d Report on the final recommendations of the task force on TACE16
  2. ^ a b Tamil Nadu Government's Tender Document for development of Tamil fonts and Tamil keyboard driver for 16-bit encodings (Unicode and TACE16)
  3. ^ a b "தமிழ் எழுத்துருக்கள் | தமிழ் இணையக் கல்விக்கழகம் Tamil Virtual Academy".
  4. ^ a b Tamil Nadu Government's Order(G.O.), Keyboard Drivers and Fonts
  5. ^ https://github.com/arcturusannamalai/open-tamil open-tamil
  6. ^ https://ezhillang.wordpress.com/2014/01/26/open-tamil-text-processing-%E0%AE%89%E0%AE%B0%E0%AF%88-%E0%AE%AA%E0%AE%95%E0%AF%81%E0%AE%AA%E0%AF%8D%E0%AE%AA%E0%AE%BE%E0%AE%AF%E0%AF%8D%E0%AE%B5%E0%AF%81/ tamil.utf8.get_letters
  7. ^ https://ezhillang.wordpress.com/2014/01/26/open-tamil-text-processing-%E0%AE%89%E0%AE%B0%E0%AF%88-%E0%AE%AA%E0%AE%95%E0%AF%81%E0%AE%AA%E0%AF%8D%E0%AE%AA%E0%AE%BE%E0%AE%AF%E0%AF%8D%E0%AE%B5%E0%AF%81/[user-generated source]
  8. ^ https://www.unicode.org/L2/L2012/12033-tamil-presentation.pdf[bare URL PDF]
  9. ^ "Archive of Notices of Non-Approval".
  10. ^ https://pypi.org/project/Open-Tamil/ open-tamil project

tamil, character, encoding, this, article, contains, content, that, written, like, advertisement, please, help, improve, removing, promotional, content, inappropriate, external, links, adding, encyclopedic, content, written, from, neutral, point, view, june, 2. This article contains content that is written like an advertisement Please help improve it by removing promotional content and inappropriate external links and by adding encyclopedic content written from a neutral point of view June 2016 Learn how and when to remove this template message Tamil All Character Encoding TACE16 is a 16 bit Unicode based character encoding scheme for Tamil language 1 2 This encoding isn t used on the web some encodings have been used for Tamil but Unicode i e UTF 8 has 100 0 use on the web Contents 1 Keyboard drivers and fonts 2 Character set 3 Analysis of TACE16 over present Unicode standard for Tamil language 3 1 Issues with the present Unicode for Tamil language 3 2 Analysis of TACE16 over Unicode Tamil 3 3 Advantages of TACE16 over Unicode Tamil 4 Alternative claims 4 1 Open Tamil 5 See also 6 ReferencesKeyboard drivers and fonts EditThe Keyboard driver for this encoding scheme are available in Tamil Virtual University website 3 for free 4 It uses Tamil99 and Tamil Typewriter keyboard layouts which are approved by Tamil Nadu Government and maps the input keystrokes to its corresponding characters of TACE16 scheme 2 To read the files which are created using TACE16 scheme the corresponding Unicode Tamil fonts for this encoding scheme are also available in the same website 3 4 These fonts not only have mapping of glyphs for characters of TACE16 format but also for the present Unicode encoding for both ASCII and Tamil characters so that they can provide backward compatibility for reading existing files which are created using present Unicode encoding scheme for Tamil language Character set EditAll characters of this encoding scheme are located in the private use area of the Basic Multilingual Plane of Unicode s Universal Character Set Tamil All Character Encoding TACE16 Character Set Consonants Vowels E10 E18 E1A E1F E20 E21 E22 E23 E24 E25 E26 E27 E28 E29 E2A E2B E2C E2D E2E E2F E30 E31 E32 E33 E34 E35 E36 E37 E38 E39 E3A E3B E3C E3D E3E E3F0 ௦ அர க க ல க ங ச ஞ ட ண த ந ப ம ய ர ல வ ழ ள ற ன 1 ௧ க ல அ க ங ச ஞ ட ண த ந ப ம ய ர ல வ ழ ள ற ன2 ௨ அர ஆ க ங ச ஞ ட ண த ந ப ம ய ர ல வ ழ ள ற ன 3 ௩ ம க க ல இ க ங ச ஞ ட ண த ந ப ம ய ர ல வ ழ ள ற ன 4 ௪ அர வ சம ஈ க ங ச ஞ ட ண த ந ப ம ய ர ல வ ழ ள ற ன 5 ௫ வ சம உ க ங ச ஞ ட ண த ந ப ம ய ர ல வ ழ ள ற ன 6 ௬ ம வ சம ஊ க ங ச ஞ ட ண த ந ப ம ய ர ல வ ழ ள ற ன 7 ௭ அர ம எ க ங ச ஞ ட ண த ந ப ம ய ர ல வ ழ ள ற ன 8 ப ர ணம ௮ ஒர ம ஏ க ங ச ஞ ட ண த ந ப ம ய ர ல வ ழ ள ற ன 9 அம வ ச ௯ இரண ட ம ஐ க ங ச ஞ ட ண த ந ப ம ய ர ல வ ழ ள ற ன A க ர த த க ம ம ம ஒ க ங ச ஞ ட ண த ந ப ம ய ர ல வ ழ ள ற ன B ர ஜ ந ல ம ஓ க ங ச ஞ ட ண த ந ப ம ய ர ல வ ழ ள ற ன C ௐ ம ந த ர ஔ க ங ச ஞ ட ண த ந ப ம ய ர ல வ ழ ள ற ன D அர க க ண ஃE க ண F ம க க ண Note Newly added Not present in Unicode v6 3 Allocated for researches NLP For future useAnalysis of TACE16 over present Unicode standard for Tamil language EditThe neutrality of this section is disputed Relevant discussion may be found on the talk page Please do not remove this message until conditions to do so are met January 2015 Learn how and when to remove this template message Analysis of TACE16 over present Unicode standard for Tamil language Issues with the present Unicode for Tamil language Edit The present Unicode standard for Tamil is considered not adequate for efficient and effective usage of Tamil in computers due to the following reasons 1 Unicode code Tamil has code positions only for 31 out of 247 Tamil Characters These 31 characters include 12 vowels 18 agara uyirmey one aytham not including five Grantha agara uyirmey which are also provided code space in Unicode Tamil The other Tamil Characters have to be rendered using a separate software Only 10 of the Tamil Characters are provided code space in the Present Unicode Tamil 90 of the Tamil Characters that are used in general text interchange are not provided code space The Uyir meys that are left out in the present Unicode Tamil are simple characters just like A B C D are characters to English Uyir meys are not glyphs nor ligatures nor conjunct characters as assumed in Unicode ka kA ki kI etc are characters to Tamil In any plain Tamil text Vowel Consonants uyir meys form 64 to 70 Vowels uyir form 5 to 6 and Consonants meys form 25 to 30 Breaking high frequency letters like vowel consonants into glyphs is highly inefficient This type of encoding which requires a rendering engine to realize a character while computing is not suitable for applications like system software developments in Tamil searching and sorting and Natural language processing NLP in Tamil It consumes extra time and space making the computing process highly inefficient For such applications Level 1 implementation where all the characters of a language have code positions in the encoding like English is required This encoding is based on ISCII 1988 and therefore the characters are not in the natural order of sequence It requires a complex collation algorithm for arranging them in the natural order of sequence It uses multiple code points to render single characters Multiple code points lead to security vulnerabilities ambiguous combinations and requires the use of normalization Simple counting letters sorting searching are inefficient It requires ZWJ ZWNJ type hidden characters It needs exception table to prevent illegal combinations of code points Unicode Indic block is built on enormous complex error prone edifice based on an encoding that is NOT built to last Very first code point says Tamil Sign Anusvara Not used in Tamil Assumed collation was same as Devanagari incorrectly uses ambiguous encoding to render same character It encodes 23 Vowel Consonants 23 consonants U and calls them as consonants against Tamil grammar Unnatural for Speech to Text Text to Speech Inefficient to store transmit and retrieval For example file reading and writing Internet etc Complex processing hinders development Need normalization for string comparison A sequence of characters may correspond to a single glyph that is ச ச Characters are not graphemes According to Unicode ச is a grapheme but ச are characters Requires Dynamic Composition a text element encoded as a sequence of a base character followed by one or more combining marks There are two methods of rendering the Vowel Consonants This leads to ambiguity in rendering characters The present Unicode is not efficient for parsing For example the name த ர வள ள வர looks like it should have seven letters However according to Unicode this name has twelve characters த ர வ ள ள வ ர To properly count the letters in this name an expert developer had to write a complex program and present it as a technical paper in a Tamil computing conference To compare counting letters in an English word is an exercise left to a beginning programmer Such problems are triggered because a simple script such as Tamil is treated as a complex script by Unicode For example in Python library open tamil 5 which uses present Unicode Standard for Tamil in order to count the number of Tamil letters in the given text the function tamil utf8 get letters is first used to parse the text into a List and then returns the length of the list as the count of the number of letters 6 This type of complex programming logic or extra additional layer of framework requirement is needed when a simple script such as Tamil is treated as a complex script The Unicode standard policy is to encode only characters not glyphs However 7 because Unicode Tamil standard includes the vowel signs as combining characters These signs that have no meaning to a Tamil reader would be displayed as is by character shaping engines that detect a blank space between them and a base character Thus Unicode introduces the dotted circle as a Tamil character Unicode Tamil is not fully supported in many platforms primarily because Tamil is treated as a complex script that requires complex processing Since all the above mentioned inefficiencies consume more processing cycles of a processor for a machine than needed it will increase the overall lifetime power usage electricity by a machine which processes Unicode Tamil For example when processing a single Tamil character kI க it has to process both consonant and vowel modifier which doubles the consumption of processing cycles of a processor Analysis of TACE16 over Unicode Tamil Edit The following data provides the comparison of analysis of current Unicode encoding for Tamil language vs TACE16 on E Governance and Browsing 1 TACE16 is efficient over Unicode Tamil by about 5 46 to 11 94 percent in the case of Data Storage Application TACE16 is efficient over Unicode Tamil by about 18 69 to 22 99 percent in the case of Sorting Index Data TACE16 is efficient over Unicode Tamil by about 25 39 when the entire data is of Tamil The default collation sequence followed Binary while using the code space values in the New TACE16 is not as per Tamil dictionary order Some of the uyir meys Agara uyirmeys are taking precedence over vowels and other Uyirmeys in the New TACE16 the vowels and agarauyir meys being in the 0B80 0B8F block and the other Uyir meys being in the 0800 to 08FF Because of this reason sorting Unicode data looks better than TACE16 data TACE16 is faster in sorting over Unicode Tamil by about 0 31 to 16 96 percent Index creation on TACE16 data is faster by 36 7 than Unicode For full key search on indexed fields TACE16 performed better than Unicode Tamil by up to 24 07 In the case of non indexed fields also TACE16 performed better than Unicode Tamil by up to 20 9 Rendering of static Tamil Data was fine with TACE16 Advantages of TACE16 over Unicode Tamil Edit The TACE16 character encoding scheme not only overcomes all the issues with the present Unicode encoding standard for Tamil language which are mentioned above but also provides major performance improvements in both processing time and processing space This system has the following additional advantages 1 The encoding is Universal since it encompasses all characters that are found in general Tamil text interchange The Collation is sequential in accordance with the code value The encoding is unambiguous Any given code point always represents the same character There is no ambiguity as in the present Unicode Tamil The Unicode Tamil encoding had many issues and there was a proposal to reencode Tamil 8 This was rejected by Unicode who said that the reencoding would be damaging and there was no convincing evidence Unicode Tamil encoding is bad 9 This system has the following advantages for computer programming Software to accommodate Tamil characters and their processing is simplified Sorting and searching are very simple For a machine TACE16 takes fewer CPU cycles and so uses less electricity than Unicode Tamil TACE16 allows to do programming based on Tamil grammar which is not very easy in Unicode Tamil needs extra framework development The encoding is very efficient to parse By simple arithmetic operation the characters can be parsed In computer programming second method is very efficient in terms of performance over large character set Also these methods follows the basic Tamil grammar that Consonant Vowel Vowel Consonant UyirMei which is not followed in Unicode Tamil Method 1 By simple arithmetic operations க இ க E210 க E203 இ E200 Constant E213 க Method 2 க E210 இ E203 க E213 E210 க E203 இ amp 000F Constant E213 க It is very efficient to divide a vowel consonant UyirMei character into its corresponding vowel and consonant This is very efficient in terms of performance over large data To get Vowel E213 க amp F20F Constant E203 இ To get Consonant E213 க amp FFF0 Constant E210 க It is very efficient to find whether a character is vowel or consonant or vowel consonant UyirMei or numbers Bitwise OR amp Bitwise AND Bitwise NOT Bitwise XOR Conditional OR amp amp Conditional AND c the TACE16 encoding for a Tamil character To check whether a character is vowel Method 1 c gt E201 amp amp c lt E20C true gt Vowel Method 2 If code positions E200 E20E E20F are not used for any other purpose c amp E20F Constant c amp amp c E20D true gt Vowel c amp E20F Constant c amp amp c E20D true gt Vowel To check whether a character is consonant or Vowel consonant UyirMei x c amp 000 F Constant If c is Vowel or Vowel Consonant then x Unique number for each vowel starting from 1 c gt E210 amp amp c lt E38C amp amp x 0 true gt Consonant c gt E210 amp amp c lt E38C amp amp x gt 1 amp amp x lt 12 true gt Vowel Consonant UyirMei To check whether a character is Tamil number Method 1 c gt E180 amp amp c lt E18C true gt Tamil Number Method 2 If code positions E18D E18F are not used for any other purpose c amp E18F Constant c gt Tamil Number c amp E18F Constant c true gt Tamil Number If code positions E18D E18F are used for any other purpose then either Method 1 or below method can be used c amp E18F Constant c amp amp c amp 000 F Constant lt 12 true gt Tamil Number It is very easy to convert numbers to Tamil numbers new Tamil number format and vice versa same as Unicode Tamil To convert a number to new format of Tamil number and vice versa direct digit to digit conversion is enough To convert a number to new format of Tamil number n single digit number 0 9 Method 1 n amp E18F Constant gt Tamil Number Method 2 n E180 Constant gt Tamil Number To convert new format of Tamil number to a number c single digit Tamil number character ௦ ௯ c amp 000 F Constant gt NumberAlternative claims EditOpen Tamil Edit The open tamil project 10 provides many of the common operations e g to extract letters from Unicode UTF 8 encoded string sorting searching etc Even though the project claims Level 1 compliance of Tamil text processing without using TACE16 the project is still written on top of extra programming logic which is needed for present Unicode Standard for Tamil usr bin env python import codecs import tamil utf8 as utf8 with codecs open singl w encoding utf 8 as ff letters utf8 get letters u க வ ளம என பத என ன ச ர for letter in letters ff write letter print letter ff write ff close generates the output output க வ ள ம எ ன ப த எ ன ன ச ர See also EditTSCII Tamil Script Code for Information Interchange AnyTaFont2UTF8 An Open source project for all Tamil Encoding Font Mapping characters References Edit a b c d Report on the final recommendations of the task force on TACE16 a b Tamil Nadu Government s Tender Document for development of Tamil fonts and Tamil keyboard driver for 16 bit encodings Unicode and TACE16 a b தம ழ எழ த த ர க கள தம ழ இண யக கல வ க கழகம Tamil Virtual Academy a b Tamil Nadu Government s Order G O Keyboard Drivers and Fonts https github com arcturusannamalai open tamil open tamil https ezhillang wordpress com 2014 01 26 open tamil text processing E0 AE 89 E0 AE B0 E0 AF 88 E0 AE AA E0 AE 95 E0 AF 81 E0 AE AA E0 AF 8D E0 AE AA E0 AE BE E0 AE AF E0 AF 8D E0 AE B5 E0 AF 81 tamil utf8 get letters https ezhillang wordpress com 2014 01 26 open tamil text processing E0 AE 89 E0 AE B0 E0 AF 88 E0 AE AA E0 AE 95 E0 AF 81 E0 AE AA E0 AF 8D E0 AE AA E0 AE BE E0 AE AF E0 AF 8D E0 AE B5 E0 AF 81 user generated source https www unicode org L2 L2012 12033 tamil presentation pdf bare URL PDF Archive of Notices of Non Approval https pypi org project Open Tamil open tamil project Retrieved from https en wikipedia org w index php title Tamil All Character Encoding amp oldid 1118364886, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.