Re: unicode (char as abstract data type)

Albert D. Cahalan (acahalan@cs.uml.edu)
Fri, 17 Apr 1998 21:38:15 -0400 (EDT)


Alan Cox writes:

>> UTF-8 is also dead.
>
> Nope. UTF8 is alive and well. Its also the only encoding validly
> usable for unix file naming where the / and 0 character rules
> are laid down by POSIX and the single unix specification.

Fine. Apps that comply with 1998 standards can get UTF-8 (or KOI-8)
even if the kernel interface uses UCS2. We have libc to handle such
troubles. Portable apps don't have assembly code to make direct
calls to the kernel.

As far as portable apps are concerned, the kernel interface could
use a 13-bit encoding with the data going backwards. It could use
an array of Pascal strings, with one string for each path component.
It doesn't matter, because apps use libc.

Many apps can bypass the whole conversion process if the kernel
uses a UCS2 interface. That includes all Java apps, Wine, and
most new code ported from Win00. It could include native apps too,
if they use non-standard (or Unix 2000 standard) API calls.

If the kernel interface uses UTF-8, then apps can _never_ bypass
the conversion overhead. We could be stuck with it for many years.
It is much better to put the UTF-8 conversion in libc.

What if libc needs to present KOI-8 data to an app? With a UTF-8
kernel API, libc converts to KOI-8 by going through UCS2 first!
If UCS2 is used by the kernel interface, libc can convert directly
to KOI-8.

> UTF8 is also the encoding proposed in
> the draft multilingual DNS extensions

That has nothing to do with the kernel interface.

>> I really don't think it is wise to fight Sun, Microsoft, and Apple
>> on this. We could get screwed much worse than EBCDIC users are.
>> Incompatibility with the rest of the world is just not cool.
>
> What you use for data files is another issue. As has been pointed
> out Unicode isnt enough for that anyway

No encoding is enough, although Unicode is closer to "enough" than
your favorite 8-bit encoding is.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu