Re: [PATCH] vfs: check dentry is still valid in get_link()

From: Al Viro
Date: Mon Jan 17 2022 - 13:10:46 EST


On Mon, Jan 17, 2022 at 04:28:52PM +0000, Al Viro wrote:

> IOW, ->free_inode() is RCU-delayed part of ->destroy_inode(). If both
> are present, ->destroy_inode() will be called synchronously, followed
> by ->free_inode() from RCU callback, so you can have both - moving just
> the "finally mark for reuse" part into ->free_inode() would be OK.
> Any blocking stuff (if any) can be left in ->destroy_inode()...

BTW, we *do* have a problem with ext4 fast symlinks. Pathwalk assumes that
strings it parses are not changing under it. There are rather delicate
dances in dcache lookups re possibility of ->d_name contents changing under
it, but the search key is assumed to be stable.

What's more, there's a correctness issue even if we do not oops. Currently
we do not recheck ->d_seq of symlink dentry when we dismiss the symlink from
the stack. After all, we'd just finished traversing what used to be the
contents of a symlink that used to be in the right place. It might have been
unlinked while we'd been traversing it, but that's not a correctness issue.

But that critically depends upon the contents not getting mangled. If it
*can* be screwed by such unlink, we risk successful lookup leading to the
wrong place, with nothing to tell us that it's happening. We could handle
that by adding a check to fs/namei.c:put_link(), and propagating the error
to callers. It's not impossible, but it won't be pretty.

And that assumes we avoid oopsen on string changing under us in the first
place. Which might or might not be true - I hadn't finished the audit yet.
Note that it's *NOT* just fs/namei.c + fs/dcache.c + some fs methods -
we need to make sure that e.g. everything called by ->d_hash() instances
is OK with strings changing right under them. Including utf8_to_utf32(),
crc32_le(), utf8_casefold_hash(), etc.