Re: UTF-8 practically vs. theoretically in the VFS API (was: Re: JFS default behavior)

From: viro
Date: Tue Feb 17 2004 - 12:53:50 EST


On Tue, Feb 17, 2004 at 04:36:13PM +0000, Jamie Lokier wrote:
> But the reason they cite is security: when applications allow
> malformed UTF-8 through, there's plenty of scope for security holes
> due to multiple encodings of "/" and "." and "\0".
>
> This is a real problem: plenty of those Windows worms that attack web
> servers get in by using multiple-escaped funny characters and
> malformed UTF-8 to get past security checks for ".." and such.

Pardon? For that kernel would have to <drumrolls> interpret the bytestream
as UTF-8. We do not. So your malformed UTF-8 for .. won't be treated as
.. by the kernel.

BTW, speaking of Plan 9, they do *NOT* reject malformed UTF-8 in pathnames.
Filtering they do is against ASCII controls - i.e. \1--\37 and \177.

All differences between our generic checks and Plan 9 generic checks (aside
of whatever checks particular fs might do) is:
1) they allow longer pathnames (64K vs our 4K, from my reading of
9/port/chan.c)
2) they do not allow pathnames containing any octet in range 1--31
3) they do not allow pathnames containing DEL (octet 127)

The rest is identical:

* Pathname is split into components by instances of octet 47 (/).
* Component is special if it's {octet 46} or {octet 46, octet 46}
(. and .. resp.).
* Name is terminated by octet 0 (NUL).
* Name components are fed to filesystem drivers without any conversions
- they go as arrays of char, with no concern for encoding.

So could we please put that strawman to rest?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/