Re: [PATCH 9/9] nfs: don't open in ->d_revalidate

From: Myklebust, Trond
Date: Tue Mar 06 2012 - 11:41:21 EST


On Tue, 2012-03-06 at 17:26 +0100, Miklos Szeredi wrote:
> "Myklebust, Trond" <Trond.Myklebust@xxxxxxxxxx> writes:
>
> > On Tue, 2012-03-06 at 13:56 +0100, Miklos Szeredi wrote:
> >> From: Miklos Szeredi <mszeredi@xxxxxxx>
> >>
> >> NFSv4 can't do reliable opens in d_revalidate, since it cannot know whether a
> >> mount needs to be followed or not. It does check d_mountpoint() on the dentry,
> >> which can result in a weird error if the VFS found that the mount does not in
> >> fact need to be followed, e.g.:
> >>
> >> # mount --bind /mnt/nfs /mnt/nfs-clone
> >> # echo something > /mnt/nfs/tmp/bar
> >> # echo x > /tmp/file
> >> # mount --bind /tmp/file /mnt/nfs-clone/tmp/bar
> >> # cat /mnt/nfs/tmp/bar
> >> cat: /mnt/nfs/tmp/bar: Not a directory
> >>
> >> Which should, by any sane filesystem, result in "something" being printed.
> >>
> >> So instead do the open in f_op->open() and in the unlikely case that the cached
> >> dentry turned out to be invalid, drop the dentry and return ESTALE to let the
> >> VFS retry.
> >
> > This patch would force a complete new walk of the path in cases where
> > today we just do a single lookup of the last component.
>
> I sort of anticipated that, but I really don't know how much of an issue
> is this.

It makes a big difference to the cache miss case, since not only is the
entire path looked up again, but the LOOKUP_REVAL flag gets set, which
forces a full lookup of each component (as opposed to just
revalidating).

> Of course it is possible to redo only the last component after
> f_op->open() return ESTALE but that requires reshuffling do_last().
>
> > It really
> > doesn't seem worth taking that penalty just in order to make some insane
> > bind mount corner cases work.
>
> It's not at all insane if for example we have multiple mount namespaces.
> NFS is clearly broken and it needs to be fixed, one way or another.

Right, but this is still an extremely rare usage case: you are the first
person to report it as being broken (OK, Al may have mentioned it in the
past too). There have been no actual user reports AFAIK.

OTOH, cache misses are frequent.

> If you think this is a serious performance regression then lets drop
> this patch and add it to the atomic open series together with the
> do_last() reshuffling.

That sounds like a good suggestion to me.

--
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@xxxxxxxxxx
www.netapp.com

èº{.nÇ+‰·Ÿ®‰­†+%ŠËlzwm…ébëæìr¸›zX§»®w¥Š{ayºÊÚë,j­¢f£¢·hš‹àz¹®w¥¢¸ ¢·¦j:+v‰¨ŠwèjØm¶Ÿÿ¾«‘êçzZ+ƒùšŽŠÝj"ú!¶iO•æ¬z·švØ^¶m§ÿðà nÆàþY&—