Re: fs: NULL deref in atime_needs_update

From: Al Viro
Date: Mon Feb 29 2016 - 11:11:55 EST


On Mon, Feb 29, 2016 at 01:34:13PM +0100, Dmitry Vyukov wrote:

> It's not that I really understand what happens here, but looking at
> the diff: is it the case that negative and inode can change under our
> feet? If so, we still probably can get an inconsistent picture (i.e.
> negative dentry but not NULL inode), can it be an issue? Is
> non-negative->negative->non-negative->negative transition possible? If
> so, we still probably can get the same crash regardless of order of
> negative/inode loads.

Yes, we can - relying on the ordering is brittle and wrong. See other
posting for possible solution; at least that one has much more understandable
rules:
* ->d_seq is bumped before and after modifications of ->d_inode and
->d_flags, which both provides the barriers and (which is what matters for
x86) guarantees that ->d_seq match on recheck (which we do anyway) means
that ->d_inode and ->d_flags match each other.
* RCU users of that part of ->d_flags should be verified by ->d_seq
check (we already are doing that - should_follow_link() didn't and that was
one of the bugs that got fixed).
* non-RCU users either have the parent locked (which stabilizes
everything) or have dentry pinned and positive (ditto). Checking that
dentry is negative (either by looking at inode or flags) does not guarantee
that it will stay such unless the parent is locked anyway. IOW, the
games with barriers and order of assignments between ->d_inode and ->d_flags
do not actually buy us anything useful.
* in case of __dentry_kill() we do *NOT* surround the stores to
->d_inode/->d_flags with ->d_seq bumps, but that's safe since by that point
we had already done __d_drop(), so RCU reader either doesn't find the
dentry in the first place, or gets ->d_seq bumped (by 2) between the
moment it's been fetched by __d_lookup_rcu() finding the sucker and
the moment when ->d_inode/->d_flags get changed. If RCU reader gets in
before that, it sees consistent ->d_inode/->d_flags, as they used to be.
It will eventually fail to grab a reference to that dentry, but that's not
our problem. If it gets in late enough to see ->d_inode and/or ->d_flags
changed, it will fail ->d_seq check and ignore the values it has fetched.
Again, no need for a barrier between ->d_inode and ->d_flags stores in
that case.

As the matter of fact, I'm somewhat tempted to make
static void dentry_unlink_inode(struct dentry * dentry)
__releases(dentry->d_lock)
__releases(dentry->d_inode->i_lock)
{
struct inode *inode = dentry->d_inode;
bool hashed = !d_unhashed(dentry);

if (hashed)
raw_write_seqcount_begin(&dentry->d_seq);
__d_clear_type_and_inode(dentry);
hlist_del_init(&dentry->d_u.d_alias);
if (hashed)
raw_write_seqcount_end(&dentry->d_seq);
spin_unlock(&dentry->d_lock);
spin_unlock(&inode->i_lock);
if (!inode->i_nlink)
fsnotify_inoderemove(inode);
if (dentry->d_op && dentry->d_op->d_iput)
dentry->d_op->d_iput(dentry, inode);
else
iput(inode);
}

and replace dentry_iput() in its only caller with

if (dentry->d_inode)
dentry_unlink_inode(dentry); /* will drop ->d_lock */
else
spin_unlock(&dentry->d_lock);

That would get rid of annoying code duplication, but I would like to see
profiles - the cost of branches might very well get unpleasant. Not sure,
and that part definitely isn't a -stable fodder.