Re: [PATCH 0/8] VFS name lookup permission checking cleanup

From: Serge E. Hallyn
Date: Tue Sep 08 2009 - 10:21:42 EST


Quoting Linus Torvalds (torvalds@xxxxxxxxxxxxxxxxxxxx):
>
> This is a series of eight trivial patches that I'd like people to take a
> look at, because I am hoping to eventually do multiple path component
> lookups in one go without taking the per-dentry lock or incrementing (and
> then decrementing) the per-dentry atomic count for each component.
>
> The aim would be to try to avoid getting that annoying cacheline ping-pong
> on the common top-level dentries that everybody looks up (ie root and home
> directories, /usr, /usr/bin etc).
>
> Right now I have some simple (but real) loads that show the contention on
> dentry->d_lock to be a roughly 3% performance hit on a single-socket
> nehalem, and I assume it can be much worse on multi-socket machines.
>
> And the thing is, it should be entirely possible to do everything but the
> last component lookup with just a single read_seqbegin()/read_seqretry()
> around the whole lookup. Yes, the last component is special and absolutely
> needs locking and counting - but the last component also doesn't tend to
> be shared, so locking it is fine.
>
> Now, I may never actually get there, but when looking at it, the biggest
> problem is actually not so much the path lookup itself, as the security
> tests that are done for each path component. And it should be noted that
> in order for a lockless seq-lock only lookup make sense, any such
> operations would have to be totally lock-free too. They certainly can't
> take mutexes etc, but right now they do.
>
> Those security tests fall into two categories:
>
> - actual security layer callouts: ima_path_check().
>
> This one looks totally pointless. Path component lookup is a horribly
> timing-critical path, and we will only do a successful lookup on a
> directory (inode needs to have a ->lookup operation), yet in the middle
> of that is a call to "ima_path_check()".
>
> Now, it looks like ima_path_check() is very much designed to only check
> the _final_ path anyway, and was never meant to be used to check the
> directories we hit on the way. In fact, the whole function starts with
>
> if (!ima_initialized || !S_ISREG(inode->i_mode))
> return 0;
>
> so it's totally pointless to do that thing on a directory where
> that !S_ISREG() test will trigger.
>
> So just remove it. IMA should never have put that check in there to
> begin with, it's just way too performance-sensitive.
>
> - the real filesystem permission checks.
>
> We used to do the common case entirely in the VFS layer, but these days
> the common case includes POSIX ACL checking, and as a result, the
> trivial short-circuit code in the VFS layer almost never triggers in
> practice, and we call down to the low-level filesystem for each
> component.
>
> We can't fix that by just removing the call, but what we _can_ do is to
> at least avoid the silly calling back-and-forth: most filesystems will
> just call back to the VFS layer to do the "generic_permission()" with
> their own ACL-checking routines.
>
> That way we can flatten the call-chain out a bit, and avoid one
> unnecessary indirect call in that timing-critical region. And
> eventually, if we make the whole ACL caching thing be something that we
> do at a VFS layer (Al Viro already worked on _some_ of that), we'll be
> able to avoid the calls entirely when we can see the cached ACL
> pointers directly.
>
> So this series of 8 patches do all these preliminary things. As shown by
> the diffstat below, it actually reduces the lines of code (mainly by just
> removing the silly per-filesystem wrappers around "generic_permission()")
> and it also makes it a _lot_ clearer what actually gets called in that
> whole 'exec_permission_lite()' function that we use to check the
> permission of a pathname lookup.
>
> Comments? Especially from the IMA people (first patch) and from generic
> VFS, security and low-level FS people (the 'Simplify exec_permission_lite'
> series, and then the check_acl + per-filesystem changes).
>
> Al?
>
> I'm looking to merge these shortly after 2.6.31 is released, but comments
> welcome.

All of them seem good, and I don't see any thinkos, no resulting skipped
checks or anything.

Acked-by: Serge Hallyn <serue@xxxxxxxxxx>

-serge
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/