Re: UTF-8 practically vs. theoretically in the VFS API (was: Re:JFS default behavior)

From: Linus Torvalds
Date: Tue Feb 17 2004 - 16:29:36 EST




On Tue, 17 Feb 2004, John Bradford wrote:
> >
> > Wrong. UTF-8 can store UCS-4 characters just fine.
>
> Does just fine include unambiguously?

If you don't care about backwards compatibility, then yes. You just have
to use "strict" UTF-8.

> Sure, standards-conforming
> UTF-8 is unambiguous, but you've already said time and again that that
> doesn't happen in the real world. I just don't agree on the UTF-8 can
> store UCS-4 characters just fine thing _at all_.

You get to choose between "throw the baby out with the bathwater" or "be
compatible".

Sane people choose compatibility. But it's your choice. You can always
normalize thing if you want to - but don't complain to me if it breaks
things. It will still break _fewer_ things than UCS-4 would, so even if
you always normalize you'd still be _better_ off with UTF-8 than you would
be with UCS-4.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/