Re: unicode (char as abstract data type)

Albert D. Cahalan (acahalan@cs.uml.edu)
Fri, 17 Apr 1998 20:59:14 -0400 (EDT)


Alex Belits writes:
> On Fri, 17 Apr 1998, Albert D. Cahalan wrote:

>> I really don't think it is wise to fight Sun, Microsoft, and Apple
>> on this. We could get screwed much worse than EBCDIC users are.
>> Incompatibility with the rest of the world is just not cool.
>>
>> The perfect time to switch is while adding 64-bit filesystem calls.
>
> Neither Sun, nor Apple or Microsoft have really converted anything
> to Unicode.

Microsoft uses Unicode in the kernel calls. Note that the C library
can still support dumb 8-bit apps as well as any other C library.

> Having NTFS filesystem where filenames are already for many years
> supposed to be in Unicode, but used by all software with the
> assumption that only 8 bits of every character matter doesn't
> mean much, so this direction is dead, too.

It is not dead. The Unicode support in the system allows for a
future world without 8-bit apps. The transition may take a decade.
When the transition is done, there won't be so much reencoding
between apps and the kernel.

>> I certainly don't want to see 8-bit kernel calls on Merced.
>
> Then you won't see vi there either.

Oh? The last time I heard, vi accessed system calls via libc.
Very few apps care about the kernel interface. I can think of
strace and maybe gdb.

With the right libc, you could even pretend the kernel used
UTF-8 for the system calls.

Note that standards generally ignore the distinction between
the kernel and libc. iBCS would be an exception, but we have
to emulate that anyway.

>> Just think about it: WE WILL BE ALONE.
>
> "Everyone" either:
>
> 1. Uses local charset and doesn't mark it anywhere.
>
> 2. Uses locale and marks one charset as current.
>
> 3. Uses MIME and assumes that document bodies and header fields don't
> have mixed charsets within them.
>
> 4. Supports Unicode/UTF-8 strings display through a wrapper that uses
> local charsets (ex: Netscape).
>
> 5. Makes professional typesetting software that handles everything
> internally in a manner, no sane programmer wants to do in the OS.

That is the applications. This is the kernel mailing list.
We have a library called "libc" that provides an interface
between the applications and the kernel. Applications can
still see filenames in KOI-8 if you so desire. (you won't
care what libc does to non-KOI-8 filenames because you won't
have any such names on your disk)

Think about the consequences of UTF-8 at the system call level:
Every system call that uses text must be first converted to UTF-8.
This burden is with us forever. Meanwhile, Windows and MacOS can
avoid conversion costs after the world converts to UCS2.

The world _will_ convert too. As much as you may hate it, you
must realize that when Sun, Microsoft, and Apple agree...
It is only a matter of time -- perhaps a decade.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu