Re: UTF-8 practically vs. theoretically in the VFS API (was: Re: JFS default behavior)

From: Jamie Lokier
Date: Tue Feb 17 2004 - 15:45:34 EST


viro@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx wrote:
> So what you are saying is that conversion of invalid multibyte sequences
> into non-error wide chars followed by conversion back into UTF-8 can
> lead to trouble? *DUH*

Yes. (The point being that it's a common bug, just like buffer
overflows are common. Rejecting malformed UTF-8 is a defensive
strategy against it).

> > The holes only arise because software which is interpreting UTF-8 is
> > mixed with software which isn't. That's one of the most useful
> > features of UTF-8, after all - that's why we use it for filenames.
>
> The holes only arise because software which is interpreting UTF-8 doesn't
> care to do it properly.

That's right. Software which does it properly rejects malformed UTF-8.
That is the entire point of my post.

> Software that doesn't interpret it (including the kernel) doesn't
> enter the picture at all.

Yes.

Your posting merely repeated what I said, so I assume we're in agreement :)

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/