Re: [PATCH 8/8] jffs2/jfs/xfs: switch over to 'check_acl' ratherthan 'permission()'

From: Linus Torvalds
Date: Wed Sep 09 2009 - 13:11:42 EST




On Tue, 8 Sep 2009, Christoph Hellwig wrote:
>
> The split of these patches is a bit odd, either do all in one patch or
> one patch per filesystem instead of those groups.

Hmm. I actually did that somewhat on purpose.

The reason? If there is something wrong, I want bisection to specify it
fairly well, and I was thinking that maybe there would be some filesystem
specific issue. I know, it's unlikely, but hey, if I knew when bugs
happened, I'd just make sure they never did..

So I wanted to keep the shmfs one separate, because people use that
filesystem independently of - and generally differently from - the other
on-disk ones, and maybe you could have a shmfs-specific bug but not a
filesystem-specific one (or vice versa). So if some application suddenly
breaks, anybody doing bisection would see which one it is.

Now, I could then have bundled up the rest of the filesystems as one
commit, or done them all as individual ones, and there I don't really have
any huge preferences. There were six filesystems that had "obviously just
the wrapper function" (done by just doing a

git grep -2 generic_permission

in case anybody cares), and quite frankly, if you do that grep, then the
ext* group stands out very clearly (next to each other, same indentation
due to filenames etc etc). So I just did them next.

And the third group was just "the rest". Not standing out in any
particular way, but also not worth doing individually in any particular
way. Bisectability in those groups doesn't much matter, because I don't
think it's all that likely that a machine that shows problems would run a
mix of filesystems, the way you'd mix shmfs with other filesystems and
other patterns.

> That beeing said if we go down this way I would prefer if we go
> down all the way, that is convert the remaining few filesystems that
> pass a check_acl argument to generic_permission (btrfs, gfs2, ocfs2)
> and just kill off that argument.

And I do agree, eventually. But the series was really meant to be a
standalone thing where I didn't force filesystems to change. I also hope
that some filesystems, like btrfs, that don't use the "trivial wrapper",
would look at _why_ they don't just use the trivial wrapper.

For some of them, they seem to have real major reasons not to just use
'generic_permission()' (eg fuse), while others - btrfs - seem to be just
odd, and eg have its own special case of btrfs-specific read-only crap.
Which is a real downer, since that MAY_WRITE case isn't even relevant for
pathname lookup, but even for other inodes I really don't see why btrfs
doesn't just use the generic VFS-layer "S_IMMUTABLE" bit.

Now, for your particular suggestion it doesn't matter (we'd just drop the
last argument, and have "generic_permission()" pick it up from the inode
ops), but the larger point was that I really didn't want to change
filesystem code unnecessarily. And I'd actually hope that eventually _no_
filesystem would use "generic_permission()" at all, and we'd just always
do that whole check_acl thing at a VFS level if the filesystem provides
acls.

Part of that is that I would also move the whole "get_cached_acl()" to
the VFS layer entirely - and I think Al has patches to this. So then we'd
only need to call some filesystem-specific "check_acl" in the cases where
we do _not_ already have cached ACL's - and then we can finally do all the
ACL checks without ever calling into the filesystem at all, which is
pretty much required if we want to do lockless lookups.

> After that there is another step we can easily go: as we now cache the
> ACLs in the generic inode instead of the per-fs one we can move the
> get_cached_acl call to your acl_permission_check helper (for gfs2/ocfs2
> that don't cache ACLs it will always fail), and not call out to the fs
> for the fast path at all.

Yes, see above. Al and I talked about this a couple of months ago, and
that's why he already moved the ACL lists to the generic inode in
preparation of this. The reason for that was that absolutely _all_
filesystems got the locking wrong (too much unnecessary locking for the
common case of a cached "no acl" case), and some of them got the caching
wrong (no caching at all).

The take-away message from this all is generally: "don't make individual
filesystems do even simple things - they'll do it wrong". It's not just
that the VFS layer will do it better, it's that if it's done in the VFS
layer we can be clever about locking in a way that we can _not_ be when we
have to rely on the filesystem writers being aware of subtle issues.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/