[Fsf-friends] Indic language bugs in Unicode
Ramanraj K
ramanraj.k@[EMAIL-PROTECTED]
Mon Nov 20 17:49:13 IST 2006
Recently, I started using UTF-8 enabled applications to read and write
in Tamil, the local official language here. It appears indic languages
have been incorrectly represented at Unicode. India had sent less than
128 chars each language to Unicode consortium in the 1990s, much less
than the full complement of characters in each. For example, among Tamil
characters, only 31 chars (12 vowels and 18 consonants + 1 Final (ஃ)
have specific codes, and the chart misses almost 12 x 18 characters
which now have to be encoded with three to nine bytes per character. To
make things worse, their arrangement is not in any natural order, and so
sorting is difficult. It appears it is difficult to amend the charts
now, as a number of applications have started using the unicode coding
charts. Almost all indic languages have the same problem.
Some would like to now have a 16 bit encoded Tamil-New chart, with codes
allocated for 250+ characters in the Private Use area. I am not sure if
other indic language groups are aware of the issues here, and what their
plans are to deal with it.
Padmakumar pointed out the issues there to the fsf-friends mailing list
in 2004:
http://mm.gnu.org.in/pipermail/fsf-friends/2004-December/002653.html
along with the link to the article at :
http://www.angelfire.com/empire/thamizh/2/
(sad that there was no response to it)
A recent TVU conference doc on the issues there is available at:
http://tamilvu.org/coresite/html/cwwhatnw.htm
There are a number of things that need to be done:
[1] Add any missing characters and re-arrange the Tamil Unicode
characters within the range of the existing 128 so that sorting could be
done
[2] Examine the TVU doc and offer suggestions to those concerned
regarding Tamil 16 bit encoding.
[3] Almost all indic languages are in the same boat here, and therefore,
the language groups ought to come up with workable plans to remove the
problems.
-Ramanraj K
More information about the Fsf-friends
mailing list