Re: [patch 2.6.13-rc3a] i386: inline restore_fpu

From: Andrew Morton
Date: Thu Jul 21 2005 - 22:29:22 EST


Chuck Ebbert <76306.1226@xxxxxxxxxxxxxx> wrote:
>
>
> This patch makes restore_fpu() an inline. When L1/L2 cache are saturated
> it makes a measurable difference.
>
> Results from profiling Volanomark follow. Sample rate was 2000 samples/sec
> (HZ = 250, profile multiplier = 8) on a dual-processor Pentium II Xeon.
>
>
> Before:
>
> 10680 restore_fpu 333.7500
> 8351 device_not_available 203.6829
> 3823 math_state_restore 59.7344
> -----
> 22854
>
>
> After:
>
> 12534 math_state_restore 130.5625
> 8354 device_not_available 203.7561
> -----
> 20888
>
>
> Patch is "obviously correct" and cuts 9% of the overhead. Please apply.

hm. What context switch rate is that thing doing?

Is the benchmark actually doing floating point stuff?

We do have the `used_math' optimisation in there which attempts to avoid
doing the FP save/restore if the app isn't actually using math. But
<ancient recollections> there's code in glibc startup which always does a
bit of float, so that optimisation is always defeated. There was some
discussion about periodically setting tasks back into !used_math state to
try to restore the optimisation for tasks which only do a little bit of FP,
but nothing actually got done.

> Next step should be to physically place math_state_restore() after
> device_not_available(). Would such a patch be accepted? (Yes it
> would be ugly and require linker script changes.)

Depends on the benefit/ugly ratio ;)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/