[Fsf-friends] Indic language bugs in Unicode

Sankarshan Mukhopadhyay sankarshan.mukhopadhyay@[EMAIL-PROTECTED]
Tue Nov 21 09:54:48 IST 2006


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I apologise for the top-posting, but would you like to go through the
discussions at indic at unicode.org mailing list ? There seems to be a lot
of dialogue going on in this issue (which is normal and should be) but
no clear solution being provided.

:SM

Ramanraj K wrote:
> Recently, I started using UTF-8 enabled applications to read and write
> in Tamil, the local official language here. It appears indic languages
> have been incorrectly represented at Unicode. India had sent less than
> 128 chars each language to Unicode consortium in the 1990s, much less
> than the full complement of characters in each. For example, among Tamil
> characters, only 31 chars (12 vowels and 18 consonants + 1 Final (ஃ)
> have specific codes, and the chart misses almost 12 x 18 characters
> which now have to be encoded with three to nine bytes per character. To
> make things worse, their arrangement is not in any natural order, and so
> sorting is difficult. It appears it is difficult to amend the charts
> now, as a number of applications have started using the unicode coding
> charts. Almost all indic languages have the same problem.
> 
> Some would like to now have a 16 bit encoded Tamil-New chart, with codes
> allocated for 250+ characters in the Private Use area. I am not sure if
> other indic language groups are aware of the issues here, and what their
> plans are to deal with it.
> 
> Padmakumar pointed out the issues there to the fsf-friends mailing list
> in 2004:
> 
> http://mm.gnu.org.in/pipermail/fsf-friends/2004-December/002653.html
> along with the link to the article at :
> http://www.angelfire.com/empire/thamizh/2/
> (sad that there was no response to it)
> 
> A recent TVU conference doc on the issues there is available at:
> http://tamilvu.org/coresite/html/cwwhatnw.htm
> 
> There are a number of things that need to be done:
> [1] Add any missing characters and re-arrange the Tamil Unicode
> characters within the range of the existing 128 so that sorting could be
> done
> [2] Examine the TVU doc and offer suggestions to those concerned
> regarding Tamil 16 bit encoding.
> [3] Almost all indic languages are in the same boat here, and therefore,
> the language groups ought to come up with workable plans to remove the
> problems.
> 
> -Ramanraj K
> 
> 
> 
> _______________________________________________
> Fsf-friends mailing list
> Fsf-friends at mm.gnu.org.in
> http://mm.gnu.org.in/mailman/listinfo/fsf-friends
> 


- --

You see things; and you say 'Why?';
But I dream things that never were;
and I say 'Why not?' - George Bernard Shaw
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iD8DBQFFYn+PXQZpNTcrCzMRAqNCAJ4hLaJOHCtp9d6PKbHxTPQVIlD3QACcCIlq
QupDAiffFf/gNfM4tOKQFNc=
=zN9W
-----END PGP SIGNATURE-----



More information about the Fsf-friends mailing list