[FSF India] ISCII vs Unicode [was Localization of GNU/Linux to Indian languages]

Rajkumar S. fsf-india@gnu.org.in
Sun, 9 Sep 2001 15:00:05 +0530 (IST)


On 8 Sep 2001, Ashhar Farhan wrote:

> 1. unicode was designed by people who dont even speak the
> languages. as a result it is incomplete and ambiguous. that is,

Unicode code pages and encodings are a ditto copies of ISCII standard.
So if their are any problems with Unicode for Indic scripts they are
due to ISCII.

> there are a number of ways to write the same words as a result is
> a schumck when it comes to searching for strings. some words
> cannot be written at all.

Can you give some examples?

> try composing srimad bhagvad geeta in unicode to know what i mean.
> for years, i have not only contributed to unicode but also argued
> with them to get real. with no response.

Srimad Bhagavat Geeta or Vedas use code points that are not present in
Unicode std for Devanagiri. But they were not present in ISCII when
Unicode adapted them. If ISCII has added later then that has to be
added to Unicode also. A deficiency is not an excuse for not using
Unicode. That has to be corrected.


> among the people who share this perception are also dr. raymond
> doctor (the linguist par excellence at the famous deccan college,
> who is recognised authority on this), dr. rajeev sanghal (who is
> the one person who knows all the encodings of indian languages),
> dr. mohan tambe (now with innomedia) who designed the ISCII and
> (happiness!) the DOT.

Yes, I know their are lot more also. I had a discussion about this
with a team from IIIT Hyd, who were fanatical about ISCII.

> the future of indian language computing will be determined here,
> by us who live and breath these languages. not some guys who cant
> even spell their own names in the languages that they claim to be
> 'experts' in.

Tricolor waving is fine but it is foolish to live in our own world and
ignore what is happening outside.

> wow well said! so, next i guess, that our spellings will also be
> governed by what the jerks put into msword hindi edition's spell
> checker.

If I understand correctly these "jerks" are a team from NSCT. And Yes,
If we ignore this our spellings WILL be determined by Hindi Software,
MS word included.

> the government of india will NOT accept unicode as it is.

Ministry of IT, GoI is a member of the Unicode consortium. So they are
trying to change the Unicode the way it should work rather than
believing that it do not exist.

> and how many copies of pango exist in the world? how many sites
> and texts have been converted to pango? how many fonts are
> available for pango's own version of unicode?

Pango will be a part of GTK 1.3 and thus the default
internationalization frame work of Gnome. Fonts are not a probles as
fonts and character encoding are different. The existing fonts can be
used.

> yeah. as i said the challenge is simple, show me a fully rendered
> and searchable bhagvad geeta written in unicode.

Show me an ISCII document of bhagvad geeta with Japanese translation.
I wonder if ISCII can handle all the indian languages in a single text
file with out higher level mark up?

> better yet, take a list of names from the northeastern territories
> another list from hyderabad's muslim quarters try printing them in
> devanagri. does it work?

I have't tried this, can you send some example names.

> yeah! and which DTP program will they use? which database will
> they use ? (unicode cannot sort indian languages for nuts).

For type setting they use TeX, Rather Omega which is the 16 bit
extension of TeX, which can type set all of worlds language in a
single page, like a page with Malayalam, Arabic, Japanese and
Mangolian. All these languages are written in different directions.

Postgres already supports UTF-8 but more work needs to be done to get
sorting etc done.

Btw can you give examples of a single Free software that runs on a
Free OS to handle DTP and Database in ISCII.

> btw, the Hindi Prachar sabha and the National Council for
> Promotion of Urdu language have standardized on ISCII and PASCII.

All these are bound to change, the sooner the better.

> it isnt that i hate unicode. it is just that it doesnt work. try
> sorting a series of urdu names starting with s and sh.

I do not hate ISCII but the concept of 8 bit encoding is *outdated*
that cannot handle all the indian languages in one text file with out
higher markup. I do not say that Unicode is problem free or software
that can handle it exists. But the concept of 16 bit encoding is too
powerful to ignore.

btw does ISCII handle "chillu" letters in Malayalam?

raj