Re: UTF-8, OSTA-UDF [why?], Unicode, and miscellaneous gibberish

NIIBE Yutaka (gniibe@mri.co.jp)
Wed, 27 Aug 1997 21:26:47 +0900


I wrote:
> However, I don't care much about internal encoding in the application.
> Personally, I think that current implementation of internal character
> encoding in Emacs-20 and UTF-8 encoding is similar. The difference is
> character encoding in Emacs-20 encodes multiple character sets, while
> UTF-8 encodes UCS. How about using UTF-8 scheme for multiple
> character sets? IMHO, it's the way to go. My rationale for multiple
> character sets is that it's very difficult to collect and maintain
> large character set (I don't think Unicode 2.0 is large enough). In
> China, there've been projects for defining character set since
> thousand years ago...

H. Peter Anvin writes:
> Trivial. Pick a range out of the *thousands* of private-use planes in
> UCS-4, and map your character set(s) onto them. Then encode the whole
> thing in UTF-8. Done.

Yes. But I'm afraid that we discuss other things each other here.
I hope we could share some ideas and experiences. My point is that
the needs of handling multiple character sets (simultaneously).

In the naive approach of using private-use planes, some problem can be
solved, yes, each person can use his/her own character set(s).
However, speaking of information interchange, we have to send
information about the character set itself along with text.
Then, it seems for me that it's multiple character sets system in fact.

Besides, I'm afraid that using UCS-4 in such a way, some people think
it's abuse of UCS-4. If it's not problem, standarization of handling
multiple character sets in UCS-4 is the way to go.

Thanks,

-- 
NIIBE Yutaka