Re: [GIT PULL] Ceph fixes for 5.1-rc7

From: Al Viro
Date: Sun Apr 28 2019 - 12:40:45 EST


On Sun, Apr 28, 2019 at 04:52:16PM +0100, Al Viro wrote:
> On Sun, Apr 28, 2019 at 11:47:58AM -0400, Jeff Layton wrote:
>
> > We could stick that in ceph_dentry_info (->d_fsdata). We have a flags
> > field in there already.
>
> Yes, but... You have it freed in ->d_release(), AFAICS, and without
> any delays. So lockless accesses will be trouble.

You could RCU-delay the actual kmem_cache_free(ceph_dentry_cachep, di)
in there, but I've no idea whether the overhead would be painful -
on massive eviction (e.g. on memory pressure) it might be. Another
variant is to introduce ->d_free(), to be called from __d_free()
and __d_free_external(). That, however, would need another ->d_flags
bit for presence of that method, so that we don't get extra overhead
from looking into ->d_op...

Looking through ->d_release() instances, we have

afs: empty, might as well have not been there

autofs: does some sync stuff (eviction from ->active_list/->expire_list)
plus kfree_rcu

ceph: some sync stuff + immediate kmem_cache_free()

debugfs: kfree(), might or might not be worth RCU-delaying

ecryptfs: sync stuff (path_put for ->lower) + RCU-delayed part

fuse: kfree_rcu()

nfs: kfree()

overlayfs: a bunch of dput() (obviously sync) + kfree_rcu()

9p: sync

So it actually might make sense to move the RCU-delayed bits to
separate method. Some ->d_release() instances would be simply
gone, as for the rest... I wonder which of the sync parts can
be moved over to ->d_prune(). Not guaranteed to be doable
(or a good idea), but... E.g. for autofs it almost certainly
would be the right place for the sync parts - we are,
essentially, telling the filesystem to forget its private
(non-refcounted) references to the victim.