Re: Comments on Microsoft Open Source documentA

Theodore Y. Ts'o (tytso@mit.edu)
Mon, 9 Nov 1998 18:21:41 GMT


Date: Sat, 7 Nov 1998 22:22:36 -0800 (PST)
From: Alex Belits <abelits@phobos.illtel.denver.co.us>

4. Unicode makes a displaying problem non-issue (all characters are in
one huge font) at the price of modifying all string-handling routines.
That however includes complete incompatibility with existing charsets,
and lack of language-labeling.

The reality is that you have to odify all string-handling routines
anyway, because of languages like Chinese where 8-bit characters simply
aren't enough.

You also want to be able to handle multiple languages using different
character sets inside one particular document, which is in fact
*simpler* to do with Unicode, since it's all (as you put it) one
gigantic font. If you don't do this, you end up needing to have magic
character-set switching escape sequences (or MIME-style headers, or some
other complex solution), and your string and display routines end up
getting just as complex, if not more so.

The bottom line is that doing internationalization is hard. As one I18N
expert was heard to say, "It would be easier to teach them all English."
Any solution will end up impacting some people more than others. It is
no doubt true that UTF-8 may end up impacting certain people more than
others. But the backwards compatibility aspects of UTF-8, combined with
the undeniable perponderence of where computers systems are deployed
(i.e., U.S. and Europe) means that it was inevitable that UTF-8 would be
chosen as the most pragmatic solution which impacts the smallest number
of people and allows for the easist transition to a full I18N support.

>From where I sit, Microsoft wasn't the only company pushing Unicode; the
push for Unicode and UTF-8 came from all directions, not just Microsoft.
Or are you going to claim that the developers of Perl and X are pawns of
Microsoft? Instead, it seems pretty clear that Perl and X chose UTF-8
because it's the sanest way to make the very hard transition from 8-bit
characters to supporting internationalization, including character sets
that simply won't fit in 256 character slots.

Finally, what in the world does this have to do with the Linux kernel?
Followups to /dev/null, please.

- Ted

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/