Re: UTF-8 practically vs. theoretically in the VFS API (was: Re:JFS default behavior)

From: Linus Torvalds
Date: Tue Feb 17 2004 - 14:49:15 EST




On Tue, 17 Feb 2004, Jamie Lokier wrote:
>
> Well, the security checks on ".." which worms get past aren't in the
> kernel either. This time _you_ made the strawman :)

Note that this is something that the kernel _can_ fix easily.

In particular, we already have flags like LOOKUP_FOLLOW and
LOOKUP_DIRECTORY that we use internally in the kernel to specify how to do
certain operations. We export _part_ of that to user space with the
O_DIRECTORY flag that says "allow open of directories".

And yes, we have security-related ones too (LOOKUP_NOALT disables the
alternamte mount-point lookup).

And it would be _trivial_ to add a LOOKUP_NODOTDOT and allow user space to
use it through a O_NODOTDOT thing. But the people who need it really need
to do it and test it, and they need to be committed enough that they say
"yes, we'd use this, even though it's not portable". Because I don't want
to add features to the kernel that people don't use, and a lot of the
users don't want to use Linux-only things..

Same goes for O_NOFOLLOW or O_NOMOUNT, to tell the kernel that it
shouldn't follow symbolic links or cross mount-points - another thing that
some software might want to use in order to check that you can't "escape"
your subtree.

So these things would be literally trivial to add, and the only issue is
whether people would really use them.

> What happens is that one program or library checks an incoming path
> for ".." components - that code knows nothing about UTF-8 of course.
>
> Then it passes the string to another program which assumes the path
> has been subject to appropriate security checks, munges it in UTF-8,
> and eventually does a file operation with it. The munging generates
> ".." components from non-minimal UTF-8 forms - if it's not obeying the
> Unicode rejection requirement (which wasn't in earlier versions), that is.

But note how my point was that YOU SHOULD NEVER EVER MUNGE A PATHNAME!

It is fundamentally _wrong_ to convert pathnames. You _cannot_ do it
correctly.

The rule should be:
- convert user-input to UTF-8 early (do _nothing_ to it before the
conversion). Allow escape sequences here.
- never ever convert readdir/getcwd/etc system-specified paths AT ALL.
They are already in "extended UTF-8" format (where the "extended" part
is the 'broken UTF-8' thing. I can be like MS and call my breakage
"extended" too ;)
- always _always_ work on the "extended UTF-8" format, and never EVER
convert that to anything else (except when you need to actually print
it, but then you encode it properly with escape sequences, the way you
have to _anyway_).

If you follow the above simple rules, you can't get it wrong. And in those
rules, ".." is the BYTE SEQUENCE in the "extended UTF-8". Nothing more.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/