Re: [RFC] syscall calling convention, stts/clts, and xstate latency

From: Andrew Lutomirski
Date: Sun Jul 24 2011 - 23:22:26 EST


On Sun, Jul 24, 2011 at 6:34 PM, Andrew Lutomirski <luto@xxxxxxx> wrote:
>
> I had in mind something a little less ambitious: making
> kernel_fpu_begin very fast, especially when used more than once.
> Currently it's slow enough to have spawned arch/x86/crypto/fpu.c,
> which is a hideous piece of infrastructure that exists solely to
> reduce the number of kernel_fpu_begin/end pairs when using AES-NI.
> Clobbering registers in syscall would reduce the cost even more, but
> it might require having a way to detect whether the most recent kernel
> entry was via syscall or some other means.

I think it will be very hard to inadvertently cause a regression,
because the current code looks pretty bad.

1. Once a task uses xstate for five timeslices, the kernel decides
that it will continue using it. The only thing that clears that
condition is __unlazy_fpu called with TS_USEDFPU set. The only way I
can see for that to happen is if kernel_fpu_begin is called twice in a
row between context switches, and that has little do with the task's
xstate usage.

2. __switch_to, when switching to a task with fpu_counter > 5, will do
stts(); clts().

The combination means that when switching between two xstate-using
tasks (or even tasks that were once xstate-using), we pay the full
price of a state save/restore *and* stts/clts.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/