Re: UTF-8 practically vs. theoretically in the VFS API (was: Re: JFS default behavior)

From: pcg
Date: Tue Feb 17 2004 - 02:16:42 EST


On Mon, Feb 16, 2004 at 02:40:25PM -0800, Linus Torvalds <torvalds@xxxxxxxx> wrote:
> Try it with a regular C locale. Do a simple
>
> echo > åäö

Just for your info, though. You can't even input these characters in a C
locale, since your libc (and/or xlib) is unable to handle them (lots of SO
C functions will barf on this one). C is 7 bit only.

> Which, if you think about is, is 100% EXACTLY equivalent to what a UTF-8
> program should do when it sees broken UTF-8.

The problem is that the very common C language makes it a pain to use
this in i18n programs. multibyte functions or iconv will no accept
these, so programs wanting to do what you are expecting to do need to
re-implement most if not all of the character handling of your typical
libc.

Yes, it's possible....

> The two cases are 100% equivalent. We've gone through this before. There
> is a bit of pain involved, but it's not something new, or something
> fundamentally impossible. It's very straightforward indeed.

The "bit" is enourmous, as you can't use your libc for text processing
anymore.

Yes, it works in non-i18n programms, but right now most programs get
i18n support, which means they will all fail to properly handle
non-locale characters.

--
-----==- |
----==-- _ |
---==---(_)__ __ ____ __ Marc Lehmann +--
--==---/ / _ \/ // /\ \/ / pcg@xxxxxxxx |e|
-=====/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+
The choice of a GNU generation |
|
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/