Re: UTF-8 practically vs. theoretically in the VFS API (was: Re: JFS default behavior)

From: John Bradford
Date: Tue Feb 17 2004 - 15:55:57 EST


Quote from Jamie Lokier <jamie@xxxxxxxxxxxxx>:
> John Bradford wrote:
> > However, I don't see why it is any more logical to make the suggestion
> > that filenames generally be treated as UTF-8, IFF they are text at
> > all, than it is to suggest that filename should be arbitrary strings
> > of 32-bit words.
>
> Ok, but... why? What does 32-bit words get you that UTF-8 does not?
> I can't think of a single advantage, just lots of disadvantages.

The advantage is that you can use them to store UCS-4.

Now, for file _contents_ this would be a compatibility disaster, which
is why UTF-8 is so convenient, but for file_names_ UCS-4 lets you
unambiguously represent any string of Unicode characters. Basically -
no more multiple representations of the same thing. No more funny
corner cases where several different strings of bytes eventually
resolve to the same name being presented to the user.

Of course, there is still the issue where the same glyphs could be
displayed on the screen for two files, for example, one called
"vertical bar", and the other one called "pipe", which is confusing,
but that's a _completely_ different issue.

As far as I can see, storing all filenames as either 7-bit ASCII or
flat UCS-4 simply eliminates the whole lot of security issues you're
thinking of. If you never use non 7-bit ASCII, legacy applications
continue to work exactly as before.

John.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/