Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.)

From: John Bradford
Date: Thu Feb 12 2004 - 12:07:56 EST


Quote from Robin Rosenberg <robin.rosenberg.lists@xxxxxxxxxx>:
> On Thursday 12 February 2004 17.17, you wrote:
> > Another thing to consider is that you can encode the same character in
> > several ways using utf8, so two filenames could have different byte
> > strings, but evaluate to the same set of unicode characters.
>
> No. That's not UTF-8.

Please don't break the CC list on replies.

I'm not sure whether it's valid UTF-8 or not, but it's certainly
possible to code, for example, an 'A', (decimal 65), via an escape to
a 31-bit character representation. Presumably the majority of UTF-8
parsers would decode the sequence as 65, rather than emit an error.

Also, even ignoring that, how do you handle things like accented
characters which can be represented as single characters, or as
sequences containing combining characters? Some applications might
convert the sequence containing combining characters in to the single
character, and others might not.

John.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/