RE: Lazy FPU restoration / moving kernel_fpu_end() to context switch

From: Thomas Gleixner
Date: Tue Jun 19 2018 - 09:09:06 EST


On Tue, 19 Jun 2018, David Laight wrote:
> From: Andy Lutomirski
> > Sent: 15 June 2018 19:54
> > On Fri, Jun 15, 2018 at 11:50 AM Dave Hansen
> > <dave.hansen@xxxxxxxxxxxxxxx> wrote:
> > >
> > > On 06/15/2018 11:31 AM, Andy Lutomirski wrote:
> > > > for (thing) {
> > > > kernel_fpu_begin();
> > > > encrypt(thing);
> > > > kernel_fpu_end();
> > > > }
> > >
> > > Don't forget that the processor has optimizations for this, too. The
> > > "modified optimization" will notice that between:
> > >
> > > kernel_fpu_end(); -> XRSTOR
> > > and
> > > kernel_fpu_start(); -> XSAVE(S|OPT)
> > >
> > > the processor has not modified the states. It'll skip doing any writes
> > > of the state. Doing what Andy is describing is still way better than
> > > letting the processor do it, but you should just know up front that this
> > > may not be as much of a win as you would expect.
> >
> > Even with the modified optimization, kernel_fpu_end() still needs to
> > reload the state that was trashed by the kernel FPU use. If the
> > kernel is using something like AVX512 state, then kernel_fpu_end()
> > will transfer an enormous amount of data no matter how clever the CPU
> > is. And I think I once measured XSAVEOPT taking a hundred cycles or
> > so even when RFBM==0, so it's not exactly super fast.
>
> If the kernel was entered by a system call do you need to save the AVX512
> state at all?
> IIRC the registers are all defined as 'called saved' so there is no expectation
> that they will be saved across the syscall wrapper function call.
> All you need to do is ensure that 'kernel' values aren't passed back to userspace.
> There is a single instruction to zero all the AVX512 registers.

Then we need different treatment for exception entries and consecutive
preemption. Lots of corner cases to cover ...

Thanks,

tglx