Re: [PATCH 3/6] vfs: Allow searching of the icache under RCU conditions [ver #2]

From: David Howells
Date: Thu Apr 25 2019 - 11:45:31 EST


Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:

> Hmm... Why do these stores to ->i_state need WRITE_ONCE, while an arseload
> of similar in fs/fs-writeback.c does not?

Because what matters in find_inode_rcu() are the I_WILL_FREE and I_FREEING
flags - and there's a gap during iput_final() where neither is set.

if (!drop) {
inode->i_state |= I_WILL_FREE;
spin_unlock(&inode->i_lock);
write_inode_now(inode, 1);
spin_lock(&inode->i_lock);
WARN_ON(inode->i_state & I_NEW);
inode->i_state &= ~I_WILL_FREE;
--->
}

inode->i_state |= I_FREEING;

It's normally covered by i_lock, but it's a problem if anyone looks at the
pair without taking i_lock.

Even flipping the order:

if (!drop) {
inode->i_state |= I_WILL_FREE;
spin_unlock(&inode->i_lock);
write_inode_now(inode, 1);
spin_lock(&inode->i_lock);
WARN_ON(inode->i_state & I_NEW);
inode->i_state |= I_FREEING;
inode->i_state &= ~I_WILL_FREE;
} else {
inode->i_state |= I_FREEING;
}

isn't a guarantee of the order in which the compiler will do things AIUI.
Maybe I've been listening to Paul McKenney too much. So the WRITE_ONCE()
should guarantee that both bits will change atomically.

Note that ocfs2_drop_inode() looks a tad suspicious:

int ocfs2_drop_inode(struct inode *inode)
{
struct ocfs2_inode_info *oi = OCFS2_I(inode);

trace_ocfs2_drop_inode((unsigned long long)oi->ip_blkno,
inode->i_nlink, oi->ip_flags);

assert_spin_locked(&inode->i_lock);
inode->i_state |= I_WILL_FREE;
spin_unlock(&inode->i_lock);
write_inode_now(inode, 1);
spin_lock(&inode->i_lock);
WARN_ON(inode->i_state & I_NEW);
inode->i_state &= ~I_WILL_FREE;

return 1;
}

David