Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.

From: Oleg Drokin
Date: Mon Jul 04 2016 - 22:28:22 EST



On Jul 3, 2016, at 2:29 AM, Al Viro wrote:

> On Sat, Jun 25, 2016 at 12:38:40PM -0400, Oleg Drokin wrote:
>
>> Sorry to nag you about this, but did any of those pan out?
>>
>> d_alloc_parallel() sounds like a bit too heavy there, esp. considering we came in with
>> a dentry already (though a potentially shared one, I understand).
>> Would not it be better to try and establish some dentry locking rule for calling into
>> d_splice_alias() instead? At least then the callers can make sure the dentry does
>> not change under them?
>> Though I guess if there's dentry locking like that, we might as well do all the
>> checking in d_splice_alias(), but that means the unhashed dentries would no
>> longer be disallowed which is a change of semantic from now.--
>
> FWIW, the only interesting case here is this:
> * no O_CREAT in flags (otherwise the parent is held exclusive).
> * dentry is found in hash
> * dentry is negative
> * dentry has passed ->d_revalidate() (i.e. in case of
> NFS it had nfs_neg_need_reval() return false).
>
> Only two instances are non-trivial in that respect - NFS and Lustre.
> Everything else will simply fail open() with ENOENT in that case.
>
> And at least for NFS we could bloody well do d_drop + d_alloc_parallel +
> finish_no_open and bugger off in case it's not in_lookup, otherwise do
> pretty much what we do in case we'd got in_lookup from the very beginning.
> Some adjustments are needed for that case (basically, we need to make
> sure we hit d_lookup_done() matching that d_alloc_parallel() and deal
> with refcounting correctly).
>
> Tentative NFS patch follows; I don't understand Lustre well enough, but it
> looks like a plausible strategy there as well.

This patch seems to have brought the other crash back in (or something similar),
the one with a negative dentry being hashed when it's already hashed.
it's not as easy to hit as before, but a lot easier than the race we are hitting here.

Also in all cases it is ls that is crashing now, which seems to highlight it is
taking some of this new paths added.
I have half a dosen crashdumps if you want me to look something up there.
This is on top of 4.7.0-rc6 a99cde438de0c4c0cecc1d1af1a55a75b10bfdef
with just your patch on top.

I can reproduce it with both the complex workload, or with a simplified one (attached),
takes anywhere from 5 to 30 minutes to hit.

>
> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> index d8015a03..5474e39 100644
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -1485,11 +1485,13 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
> struct file *file, unsigned open_flags,
> umode_t mode, int *opened)
> {
> + DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
> struct nfs_open_context *ctx;
> struct dentry *res;
> struct iattr attr = { .ia_valid = ATTR_OPEN };
> struct inode *inode;
> unsigned int lookup_flags = 0;
> + bool switched = false;
> int err;
>
> /* Expect a negative dentry */
> @@ -1528,6 +1530,17 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
> attr.ia_size = 0;
> }
>
> + if (!(open_flags & O_CREAT) && !d_unhashed(dentry)) {
> + d_drop(dentry);
> + switched = true;
> + dentry = d_alloc_parallel(dentry->d_parent,
> + &dentry->d_name, &wq);

Hm, d_alloc_parallel can return some preexisting dentry it was able to lookup in
rcu attached to the same parent with the same name it seems?
What's to stop a parallel thread to rehash the dentry after we dropped it here,
and so d_alloc_parallel will happily find it?
I am running a test to verify this theory now.

> + if (IS_ERR(dentry))
> + return PTR_ERR(dentry);
> + if (unlikely(!d_in_lookup(dentry)))
> + return finish_no_open(file, dentry);
> + }
> +
> ctx = create_nfs_open_context(dentry, open_flags);
> err = PTR_ERR(ctx);
> if (IS_ERR(ctx))
> @@ -1563,14 +1576,23 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
> trace_nfs_atomic_open_exit(dir, ctx, open_flags, err);
> put_nfs_open_context(ctx);
> out:
> + if (unlikely(switched)) {
> + d_lookup_done(dentry);
> + dput(dentry);
> + }
> return err;
>
> no_open:
> res = nfs_lookup(dir, dentry, lookup_flags);
> - err = PTR_ERR(res);
> + if (switched) {
> + d_lookup_done(dentry);
> + if (!res)
> + res = dentry;
> + else
> + dput(dentry);
> + }
> if (IS_ERR(res))
> - goto out;
> -
> + return PTR_ERR(res);
> return finish_no_open(file, res);
> }
> EXPORT_SYMBOL_GPL(nfs_atomic_open);