Re: [PATCH] vfs: check dentry is still valid in get_link()

From: Al Viro
Date: Mon Jan 17 2022 - 14:49:04 EST


On Mon, Jan 17, 2022 at 06:10:36PM +0000, Al Viro wrote:
> On Mon, Jan 17, 2022 at 04:28:52PM +0000, Al Viro wrote:
>
> > IOW, ->free_inode() is RCU-delayed part of ->destroy_inode(). If both
> > are present, ->destroy_inode() will be called synchronously, followed
> > by ->free_inode() from RCU callback, so you can have both - moving just
> > the "finally mark for reuse" part into ->free_inode() would be OK.
> > Any blocking stuff (if any) can be left in ->destroy_inode()...
>
> BTW, we *do* have a problem with ext4 fast symlinks. Pathwalk assumes that
> strings it parses are not changing under it. There are rather delicate
> dances in dcache lookups re possibility of ->d_name contents changing under
> it, but the search key is assumed to be stable.
>
> What's more, there's a correctness issue even if we do not oops. Currently
> we do not recheck ->d_seq of symlink dentry when we dismiss the symlink from
> the stack. After all, we'd just finished traversing what used to be the
> contents of a symlink that used to be in the right place. It might have been
> unlinked while we'd been traversing it, but that's not a correctness issue.
>
> But that critically depends upon the contents not getting mangled. If it
> *can* be screwed by such unlink, we risk successful lookup leading to the
> wrong place, with nothing to tell us that it's happening. We could handle
> that by adding a check to fs/namei.c:put_link(), and propagating the error
> to callers. It's not impossible, but it won't be pretty.
>
> And that assumes we avoid oopsen on string changing under us in the first
> place. Which might or might not be true - I hadn't finished the audit yet.
> Note that it's *NOT* just fs/namei.c + fs/dcache.c + some fs methods -
> we need to make sure that e.g. everything called by ->d_hash() instances
> is OK with strings changing right under them. Including utf8_to_utf32(),
> crc32_le(), utf8_casefold_hash(), etc.

And AFAICS, ext4, xfs and possibly ubifs (I'm unfamiliar with that one and
the call chains there are deep enough for me to miss something) have the
"bugger the contents of string returned by RCU ->get_link() if unlink()
happens" problem.

I would very much prefer to have them deal with that crap, especially
since I don't see why does ext4_evict_inode() need to do that memset() -
can't we simply check ->i_op in ext4_can_truncate() and be done with
that?