Re: [PATCH] lib/genalloc: use try_cmpxchg in {set,clear}_bits_ll

From: Al Viro
Date: Thu Jan 26 2023 - 22:54:52 EST


On Mon, Jan 23, 2023 at 11:36:26AM -0800, Linus Torvalds wrote:

> HOWEVER. There is one special exception that might be interesting and
> that has never been done: 'fstat()' and friends could possibly avoid
> the "try_to_unlazy()" even for the last component.
>
> IOW, we might be able to do fstatat() without ever even finalizing the
> RCU state of the pathname, and actually looking up the stat
> information while still in RCU mode, and instead of doing the actual
> final lockref_get_not_dead() to turn an RCU path into a real
> ref-counted path, we could just do the "get the stat info, then do
> read_seqcount_retry() to verify that the RCU walk _and_ the stat info
> is still valid".
>
> But I'm not convinced that final complexity would be worth it. It
> sounds like a potentially fun and interesting exercise (Al Viro added
> to particupants so that he can say "No! That's not 'fun and exciting',
> that's just crazy!") if somebody really wants to, but see above: the
> last path component is very seldom something that sees contention. You
> look up the same root/pwd over and over again, but nobody sane looks
> up the same full pathname over and over again.
>
> And extending the LOOKUP_RCU window all the way over the stat info
> gathering would require a lot of care, and force us to expose the
> kinds of things we do for LOOKUP_RCU in namei.c to fs/stat.c too.

Interesting... So basically a variant of filename_lookup() that
fills struct kstat instead of doing that to struct path?

Not impossible, but will take some care; in particular, I'd consider
non-default ->getattr() as "fall back to non-RCU *now*, we are not
going there" - otherwise it becomes too nasty. STATX_DIOALIGN on
block devices is even worse...

Need to check what struct user_namespace and associated stuff looks
like wrt RCU; OTOH, I wouldn't expect too much headache there,
given that things like may_follow_link() already cover most (if
not everything) of what we'd need in generic_fillattr().

Looks like the main obstacle is how to deal with duplication between
that thing and vfs_getattr{,_nosec}(); one possibility would be to
teach vfs_getattr about the possibility of being called from RCU
mode, and have it used by that new thing. However, I really want
to avoid passing struct nameidata to that sucker. Hmmmm...

OK, so what are the reasons for falling out of RCU mode there and
how early can we detect those?
1) non-NULL ->getattr() (OK, corresponding new bit in
->i_opflags). Can be seen immediately.
2) STATX_DIOALIGN combined with S_ISBLK(). Also can be
seen immediately.
3) security_inode_getattr(). That can get unpleasant -
it exposes apparmor and tomoyo to operating in RCU mode. Until
now they had been spared that. OTOH, they can just fail with
-ECHILD if they see whatever indicator of "you are called in RCU
mode" we end up choosing...

Hell knows - that just might be doable. vfs_getattr() grows a flag
(not sure where - passing inode separately looks like a plausible
approach, with non-NULL meaning "we are in RCU mode"). If it
sees the indication of being called in RCU mode it should return -ECHILD
ASAP if it can't handle things. Old callers just pass it
NULL for inode; filename_statx() (in RCU mode) does something like
err = vfs_getattr(nd->path, nd->inode, ....);
if (err == -ECHILD)
if (!try_to_unlazy(nd))
sod off and restart in non-RCU mode
err = vfs_getattr(nd->path, NULL, ....);
if (!err && (nd->flags & LOOKUP_RCU)) {
do a trimmed-down variant of try_to_unlazy, just checking
the seq counts and not grabbing anything.
}

Oh, wait - there's the rest of the stuff done by complete_walk()...
Ugh... ->d_weak_revalidate() part obviously means "no RCU for you",
but it needs to be handled for non-RCU modes and there's LOOKUP_IS_SCOPED
part...

I'll poke around at some point, but I've a bunch of other pending crap
at the moment. It _might_ be doable without too much ugliness, but
then this is just from a cursory look - there might be dragons.