Re: UTF-8 practically vs. theoretically in the VFS API (was: Re: JFS default behavior)

From: Jamie Lokier
Date: Tue Feb 17 2004 - 14:54:02 EST


Jamie Lokier wrote:
> Understand, this isn't a kernel problems; it is simply a good reason
> to reject malformed UTF-8 by programs which parse UTF-8.

I should make clear: since the kernel _doesn't_ parse UTF-8, the
kernel _isn't_ an appropriate place to reject it.

Any userspace program which treats the result of readdir() as UTF-8
characters for any purpose should reject malformed names. The tough
design decisions are: where in the program to do it, and how to ensure
it will always be done.

You have to reject or escape malformed names at _some_ stage when they
are going to appear in a text context. The trouble is doing it too
soon (where the program calls readdir()) prevents operating on some
files, and doing it later (where the program is going to use it in a
text context) is easy to forget because by the time a string from
readdir() has travelled through many layers of abstraction between
libraries, it's easy to forget its byteish properties.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/