Re: UTF-8, OSTA-UDF [why?], Unicode, and miscellaneous gibberish

H. Peter Anvin (hpa@transmeta.com)
27 Aug 1997 06:00:47 GMT


Followup to: <199708270150.KAA09709@megatherium.mri.co.jp>
By author: NIIBE Yutaka <gniibe@mri.co.jp>
In newsgroup: linux.dev.kernel
>
> I agree with you in many technical points. Yes, I use
> non-iso8859-1-supported language everyday (Japanese), we've been
> struggling with character problems on computer for years. My
> experiences includes editor, e-mail, NetNews, and so on. Acutually,
> today I attend a meeting which relates ISO 10646-2 activity. Besides,
> we're merging MULE (Multilingual Enhancement) features into
> forthcoming Emacs 20 currently. You may know that it supports
> multiple character sets, and native encoding. Supports for multiple
> character sets and native encoding are necessarily for backward
> compatibility and information interchange.
>
> However, I don't care much about internal encoding in the application.
> Personally, I think that current implementation of internal character
> encoding in Emacs-20 and UTF-8 encoding is similar. The difference is
> character encoding in Emacs-20 encodes multiple character sets, while
> UTF-8 encodes UCS. How about using UTF-8 scheme for multiple
> character sets? IMHO, it's the way to go. My rationale for multiple
> character sets is that it's very difficult to collect and maintain
> large character set (I don't think Unicode 2.0 is large enough). In
> China, there've been projects for defining character set since
> thousand years ago...
>

Trivial. Pick a range out of the *thousands* of private-use planes in
UCS-4, and map your character set(s) onto them. Then encode the whole
thing in UTF-8. Done.

-hpa

-- 
    PGP: 2047/2A960705 BA 03 D3 2C 14 A8 A8 BD  1E DF FE 69 EE 35 BD 74
    See http://www.zytor.com/~hpa/ for web page and full PGP public key
Always looking for a few good BOsFH.  **  Linux - the OS of global cooperation
        I am Baha'i -- ask me about it or see http://www.bahai.org/