Re: UTF-8, OSTA-UDF [why?], Unicode, and miscellaneous gibberish

H. Peter Anvin (hpa@transmeta.com)
28 Aug 1997 00:09:41 GMT


Followup to: <199708272103.OAA25596@connectnet1.connectnet.com>
By author: Darin Johnson <darin@connectnet.com>
In newsgroup: linux.dev.kernel

> Ok, the problem is, each person, or group, needs their own private
> extensions to the charsets. This makes information interchange
> difficult. This is because you have to ensure that everyone you send
> your document to understands your private-use plane! Now if you
> standardize within a country, then you've essentially created a new
> charset, with unicode as the base; but the whole point of unicode was
> to avoid multiple characters sets in the first place.
>
> That's essentially what he's saying - private character sets are
> nearly useless for communication; and if used you end up with multiple
> character sets all over again.

Well, you have the same problem with multiple character sets: how do
you tag them?

> "Myspiffycharacterset" is the problem. The government of a major
> economic power should not be required to invent a "spiffy" character
> set just to send internal memos. You shouldn't need to invent a
> spiffy character set to address a letter to someone in Tokyo or
> Beijing. You should use the private space to support Klingon, not
> Chinese.

I agree, they shouldn't. Which is why U+20000 to U+2FFFF is currently
being defined as additional Chinese characters not already present in
the Basic Multilingual Plane. Note that that is more characters than
*any* Han-using country allows for in their own national standards.

However, if you want to support multiple character sets
simultaneously, you have to tag them somehow, and mapping them within
UCS-4 is one way of doing such tagging. ISO 2022 is another.

> So - back to the *kernel*: I would think that even supporters of
> unicode, at this point, can see that it is still a very contentious
> standard, controversial enough that there is a high possibility of the
> standards changing a lot in the future, or the standard being ignored
> by lots of people (and an unused standard isn't really a standard
> anymore). Thus it's too controversial for standardization inside of
> Linux. Leave it to user space libraries and linux distributions.
> Later, if unicode does become widely accepted, then think about adding
> it in the kernel.
>
> (and for heavens sake, if someone does add it to the kernel; make it
> a compile time option!!!)

The kernel shouldn't need to know about character sets (the console
terminal emulator and foreign filesystems being the unfortunate
exceptions.) The issue is therefore academic.

-hpa

-- 
    PGP: 2047/2A960705 BA 03 D3 2C 14 A8 A8 BD  1E DF FE 69 EE 35 BD 74
    See http://www.zytor.com/~hpa/ for web page and full PGP public key
Always looking for a few good BOsFH.  **  Linux - the OS of global cooperation
        I am Baha'i -- ask me about it or see http://www.bahai.org/