Re: UTF-8 practically vs. theoretically in the VFS API (was: Re:JFS default behavior)

From: Linus Torvalds
Date: Tue Feb 17 2004 - 16:02:02 EST




On Tue, 17 Feb 2004, John Bradford wrote:
>
> Why not:

I'll start with the first one. That already kills the rest.

> * State that filenames are strings of 32-bit words. UCS-4 should be
> the prefered format for storing text in them, but storing legacy
> encodings in the low 8 bits is acceptable, (but a Bad Thing for new
> installations).

UCS-4 is as braindamaged as UCS-2 was, and for all the same reasons.

It's bloated, non-expandable, and not backwards compatible.

In contrast, UTF-8 doesn't measurably expand any normal text that didn't
need it, is backwards compatible in the major ways that matter, and can be
extended arbitrarily.

UCS-4 has _zero_ advantages over UTF-8.

Please. Give it up. Anybody who thinks that _any_ other encoding format
than UTF-8 is valid is just _wrong_.

(Now, I'll give that a lot of people don't like Unicode, so I'll allow
that maybe you'd want to use the UTF-8 _encoding_scheme_ for some other
mapping, but I don't see that that is worth the pain any more. Unicode may
be a horrible enumeration, but in the end all font encodings are arbitrary
anyway, so the unicode haters might as well start giving up).

In short: even if you hate Unicode with a passion, and refuse to touch it
and think standards are worthless, you should still use the same
transformation that UTF-8 does to your idiotic character set of the day.
Because the _transform_ makes sense regardless of character set encoding.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/