Re: [PATCH v2 2/2] vfs: avoid duplicating creds in faccessat if possible

From: Mateusz Guzik
Date: Tue Jan 24 2023 - 05:16:48 EST


On 1/23/23, Paul Moore <paul@xxxxxxxxxxxxxx> wrote:
> On Fri, Jan 20, 2023 at 7:50 PM Mateusz Guzik <mjguzik@xxxxxxxxx> wrote:
>> On 1/20/23, Paul Moore <paul@xxxxxxxxxxxxxx> wrote:
>> > On Mon, Jan 16, 2023 at 4:21 PM Mateusz Guzik <mjguzik@xxxxxxxxx>
>> > wrote:
>> >>
>> >> access(2) remains commonly used, for example on exec:
>> >> access("/etc/ld.so.preload", R_OK)
>> >>
>> >> or when running gcc: strace -c gcc empty.c
>> >> % time seconds usecs/call calls errors syscall
>> >> ------ ----------- ----------- --------- --------- ----------------
>> >> 0.00 0.000000 0 42 26 access
>> >>
>> >> It falls down to do_faccessat without the AT_EACCESS flag, which in
>> >> turn
>> >> results in allocation of new creds in order to modify fsuid/fsgid and
>> >> caps. This is a very expensive process single-threaded and most
>> >> notably
>> >> multi-threaded, with numerous structures getting refed and unrefed on
>> >> imminent new cred destruction.
>> >>
>> >> Turns out for typical consumers the resulting creds would be identical
>> >> and this can be checked upfront, avoiding the hard work.
>> >>
>> >> An access benchmark plugged into will-it-scale running on Cascade Lake
>> >> shows:
>> >> test proc before after
>> >> access1 1 1310582 2908735 (+121%) # distinct files
>> >> access1 24 4716491 63822173 (+1353%) # distinct files
>> >> access2 24 2378041 5370335 (+125%) # same file
>> >
>> > Out of curiosity, do you have any measurements of the impact this
>> > patch has on the AT_EACCESS case when the creds do need to be
>> > modified?
>>
>> I could not be arsed to bench that. I'm not saying there is literally 0
>> impact, but it should not be high and the massive win in the case I
>> patched imho justifies it.
>
> That's one way to respond to an honest question asking if you've done
> any tests on the other side of the change. I agree the impact should
> be less than the advantage you've shown, but sometimes it's nice to
> see these things.
>

So reading this now I do think it was worded in quite a poor manner, so
apologies for that.

Wording aside, I don't know whether this is just a passing remark or
are you genuinely concerned about the other case.

If you are, I noted there is an immediately achievable speed up by
eliminating the get/put ref cycle on creds coming from override_creds +
put_cred to backpedal from it. This should be enough to cover it, but
there are cosmetic problems around it I don't want to flame over.

Say override_creds_noref gets added doing the usual work, except for
get_new_cred.

Then override_creds would be:
validate_creds(new);
get_new_cred((struct cred *)new);
override_creds_noref(new);

But override_creds_noref would retain validate_creds new/old and the
above would repeat it which would preferably be avoided. Not a problem
if it is deemed ok to get_new_cred without validate_creds.

>> These funcs are literally next to each other, I don't think that is easy
>> to miss. I concede a comment in access_override_creds to take a look at
>> access_need_override_creds would not hurt, but I don't know if a resend
>> to add it is justified.
>
> Perhaps it's because I have to deal with a fair amount of code getting
> changed in one place and not another, but I would think that a comment
> would be the least one could do here and would justify a respin.
>

I'm not going to *insist* on not adding that comment.

Would this work for you?

diff --git a/fs/open.c b/fs/open.c
index 3c068a38044c..756177b94b04 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -407,6 +407,11 @@ static const struct cred *access_override_creds(void)
if (!override_cred)
return NULL;

+ /*
+ * XXX access_need_override_creds performs checks in hopes of
+ * skipping this work. Make sure it stays in sync if making any
+ * changes here.
+ */
override_cred->fsuid = override_cred->uid;
override_cred->fsgid = override_cred->gid;

if not, can you phrase it however you see fit for me to copy-paste?

> In my opinion a generalized shallow copy approach has more value than
> a one-off solution that has the potential to fall out of sync and
> cause a problem in the future (I recognize that you disagree on the
> likelihood of this happening).
>

To reiterate my stance, the posted patch is trivial to reason about
and it provides a marked improvement for the most commonly seen case.
It comes with some extra branches for the less common case, which I
don't consider to be a big deal.

>From the quick toor I took around kernel/cred.c I think the cred code
is messy and it would be best to sort it out before doing anything
fancy. I have no interest in doing the clean up.

The shallow copy idea I outlined above looks very simple, but I can't
help the feeling there are surprises there, so I'm reluctant to roll
with it as is.

More importantly I can't show any workload which runs into the other
case, thus if someone asks me to justify the complexity I will have
nothing, which is mostly why I did not go for it.

That said, if powers to be declare this is the way forward, I can
spend some time getting it done.

> Ultimately it's a call for the VFS folks as they are responsible for
> the access() code.
>

Well let's wait and see. :)

--
Mateusz Guzik <mjguzik gmail.com>