Re: Lazy FPU restoration / moving kernel_fpu_end() to context switch

From: Peter Zijlstra
Date: Mon Jun 18 2018 - 05:44:59 EST


On Fri, Jun 15, 2018 at 10:30:46PM +0200, Jason A. Donenfeld wrote:
> On Fri, Jun 15, 2018 at 9:34 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > Didn't we recently do a bunch of crypto patches to help with this?
> >
> > I think they had the pattern:
> >
> > kernel_fpu_begin();
> > for (units-of-work) {
> > do_unit_of_work();
> > if (need_resched()) {
> > kernel_fpu_end();
> > cond_resched();
> > kernel_fpu_begin();
> > }
> > }
> > kernel_fpu_end();
>
> Right, so that's the thing -- this is an optimization easily available
> to individual crypto primitives. But I'm interested in applying this
> kind of optimization to an entire queue of, say, tiny packets, where
> each packet is processed individually. Or, to a cryptographic
> construction, where several different primitives are used, such that
> it'd be meaningful not to have to get the performance hit of
> end()begin() in between each and everyone of them.

I'm confused.. how does the above not apply to your situation?