Re: [PATCH 3.1?] x86: Remove useless stts/clts pair in __switch_to

From: Andrew Lutomirski
Date: Mon Jul 25 2011 - 09:05:25 EST


On Mon, Jul 25, 2011 at 7:12 AM, Ingo Molnar <mingo@xxxxxxx> wrote:
>
> * Andy Lutomirski <luto@xxxxxxx> wrote:
>
>> An stts/clts pair takes over 70 ns by itself on Sandy Bridge, and
>> when other things are going on it's apparently even worse.  This
>> saves 10% on context switches between threads that both use extended
>> state.
>>
>> Signed-off-by: Andy Lutomirski <luto@xxxxxxx>
>> Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
>> Cc: Arjan van de Ven <arjan@xxxxxxxxxxxxx>,
>> Cc: Avi Kivity <avi@xxxxxxxxxx>
>> ---
>>
>> This is not as well tested as it should be (especially on 32-bit, where
>> I haven't actually tried compiling it), but I think this might be 3.1
>> material so I want to get it out for review before it's even more
>> unjustifiably late :)
>>
>> Argument for inclusion in 3.1 (after a bit more testing):
>>  - It's dead simple.
>>  - It's a 10% speedup on context switching under the right conditions [1]
>>  - It's unlikely to slow any workload down, since it doesn't add any work
>>    anywwhere.
>>
>> Argument against:
>>  - It's late.
>
> I think it's late.
>
> Would be much better to stick it into the x86/xsave tree i pointed to
> and treat and debug it as a coherent unit. FPU bugs need a lot of
> time to surface so we definitely do not want to fast-track it. In
> fact if we want it in v3.2 we should start assembling the tree right
> now.

Fair enough. I make no guarantee that I'll have anything ready in
less than a few weeks. I'm defending my thesis in a week, and kernel
hacking is entirely a distraction. :) (The only thing my thesis has
to do with operating systems is that I mention recvmmsg.)

>
> Also, if you are tempted by the prospect of possibly enabling vector
> instructions for the x86 kernel, we could try that too, and get
> multiple speedups for the price of having to debug the tree only once
> ;-)

I'll play with it. I have some other cleanup / speedup ideas, too,
and I'll see where they go. Given that the kernel doesn't really use
floating-point math, I'm not sure that gcc will do much unless we turn
on -ftree-vectorize, and that's a little scary.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/