Re: [PATCH RESEND v3 6/6] powerpc/signal: Use unsafe_copy_siginfo_to_user()

From: Christophe Leroy
Date: Tue Sep 14 2021 - 10:01:03 EST




Le 13/09/2021 à 21:11, Eric W. Biederman a écrit :
Christophe Leroy <christophe.leroy@xxxxxxxxxx> writes:

Le 13/09/2021 à 18:21, Eric W. Biederman a écrit :
ebiederm@xxxxxxxxxxxx (Eric W. Biederman) writes:

Christophe Leroy <christophe.leroy@xxxxxxxxxx> writes:

Use unsafe_copy_siginfo_to_user() in order to do the copy
within the user access block.

On an mpc 8321 (book3s/32) the improvment is about 5% on a process
sending a signal to itself.

If you can't make function calls from an unsafe macro there is another
way to handle this that doesn't require everything to be inline.

From a safety perspective it is probably even a better approach.

Yes but that's exactly what I wanted to avoid for the native ppc32 case: this
double hop means useless pressure on the cache. The siginfo_t structure is 128
bytes large, that means 8 lines of cache on powerpc 8xx.

But maybe it is acceptable to do that only for the compat case. Let me think
about it, it might be quite easy.

The places get_signal is called tend to be well known. So I think we
are safe from a capacity standpoint.

I am not certain it makes a difference in capacity as there is a high
probability that the stack was deeper recently than it is now which
suggests the cache blocks might already be in the cache.

My sense it is worth benchmarking before optimizing out the extra copy
like that.

On the extreme side there is simply building the entire sigframe on the
stack and then just calling it copy_to_user. As the stack cache lines
are likely to be hot, and copy_to_user is quite well optimized
there is a real possibility that it is faster to build everything
on the kernel stack, and then copy it to the user space stack.

It is also possible that I am wrong and we may want to figure out how
far up we can push the conversion to the 32bit siginfo format.

If could move the work into collect_signal we could guarantee there
would be no extra work. That would require adjusting the sigframe
generation code on all of the architectures.

There is a lot we can do but we need benchmarking to tell if it is
worth it.



Sure, I'm benchmarking all the work I have been doing on signal code with the following simple app that I run with 'perf stat':

#include <stdlib.h>
#include <signal.h>

void sigusr1(int sig) { }

int main(int argc, char **argv)
{
int i = 100000;

signal(SIGUSR1, sigusr1);
for (;i--;)
raise(SIGUSR1);
exit(0);
}


On an mpc8321 a 32 bits powerpc with KUAP enabled (KUAP is equivalent of x86 SMAP)

Before changing copy_siginfo_to_user() to unsafe_copy_siginfo_to_user(), 'perf stat' reports 1983 msec (task-clock)

After my change I get 1900 msec.

With your approach I get 1930 msec, so we are loosing 36% of the benefit of converting to the 'unsafe_' alternative.

So I think it is worth it.

Christophe