Re: RFC: unlazy fpu for frequent fpu users

From: Chuck Ebbert
Date: Sat Jul 01 2006 - 07:54:34 EST


In-Reply-To: <1151705536.11434.69.camel@xxxxxxxxxxxxxxxxxxxxx>

On Sat, 01 Jul 2006 00:12:16 +0200, Arjan van de Ven wrote:

> right now the kernel on x86-64 has a 100% lazy fpu behavior: after
> *every* context switch a trap is taken for the first FPU use to restore
> the FPU context lazily. This is of course great for applications that
> have very sporadic or no FPU use (since then you avoid doing the
> expensive save/restore all the time). However for very frequent FPU
> users... you take an extra trap every context switch.
>
> The patch below adds a simple heuristic to this code: After 5
> consecutive context switches of FPU use, the lazy behavior is disabled
> and the context gets restored every context switch. If the app indeed
> uses the FPU, the trap is avoided. (the chance of the 6th time slice
> using FPU after the previous 5 having done so are quite high obviously).
>

You can do better that that. FXSR doesn't destroy the FPU contents; if
you track the context carefully you can completely avoid the restore.
This requires keeping a per-cpu variable that holds a pointer to the
thread that last used the FPU and a per-thread variable containing the
CPU number on which the thread last used FP math. Unfortunately this
won't work in x86_64 because of the 'fxsave information leak' workaround.

> --- linux-2.6.17-sleazyfpu.orig/arch/x86_64/kernel/process.c
> +++ linux-2.6.17-sleazyfpu/arch/x86_64/kernel/process.c
> @@ -515,6 +515,9 @@ __switch_to(struct task_struct *prev_p,
> int cpu = smp_processor_id();
> struct tss_struct *tss = &per_cpu(init_tss, cpu);
>
> + /* prefetch the fxsave area into the cache */
> + prefetch(&next->i387.fxsave);
> +
> /*
> * Reload esp0, LDT and the page table pointer:
> */

This prefetch is probably a bad idea. I ported your patch to i386 and it was
actually slower until I changed it:

+ if (next_p->fpu_counter > 5)
+ /* prefetch the fxsave area into the cache */
+ prefetch(&next->i387.fxsave);
+

Now it's ~.4% faster. The test was an FP program doing a simple benchmark
while a non-FP program ran in a tight loop.

--
Chuck
"You can't read a newspaper if you can't read." --George W. Bush
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/