Re: UTF-8 practically vs. theoretically in the VFS API (was: Re: JFS default behavior)

From: Jamie Lokier
Date: Tue Feb 17 2004 - 11:38:59 EST


Linus Torvalds wrote:
> Which flies in the face of "Be strict in what you generate, be liberal in
> what you accept". A lot of the functions are _not_ willing to be liberal
> in what they accept. Which sometimes just makes the problem worse, for no
> good reason.

Unicode specifies that a program claiming to read UTF-8 _must_ reject
malformed UTF-8.

Ok, we can just ignore Unicode. :)

But the reason they cite is security: when applications allow
malformed UTF-8 through, there's plenty of scope for security holes
due to multiple encodings of "/" and "." and "\0".

This is a real problem: plenty of those Windows worms that attack web
servers get in by using multiple-escaped funny characters and
malformed UTF-8 to get past security checks for ".." and such.

In theory these are not problems; all programs should be liberal in
what they accept, and robust in handling data from the outside world.

In practice, programs quickly lose track of which text is from the
outside world and which is from a trusted source or checked source.
These worms are quite successful at exploiting things the programmers
didn't think of. Being _conservative_ at all places which scan UTF-8
does seem like it might help a little.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/