Re: [Bug 350] New: i386 context switch very slow compared to 2.4 due to wrmsr (performance)

From: Linus Torvalds (
Date: Wed Feb 12 2003 - 00:54:37 EST

In article <>,
Jamie Lokier <> wrote:
>A cute and wonderful hack is to use the 6 words in the TSS prior to
>&tss->es as the trampoline. Now that __switch_to is done in software,
>those words are not used for anything else.


That's not cute and wonderful, that's _horrible_.

Mixing data and code on the same page is very very slow on a P4 (well, I
think it's "same half-page", but the point is that you should not EVER
mix data and code - it ends up being slow on modern CPU's).

>Other fixed offsets from &tss->esp0 are possible - especially nice
>would be to share a cache line with the GDT's hot cache line. (To do
>this, place GDT before TSS, make KERNEL_CS near the end of the GDT,
>and then the accesses to GDT, trampoline and tss->esp0 will all touch
>the same cache line if you're lucky).

Since almost all x86 CPU's have some kind of cacheline exclusion policy
between the I$ and the D$ (to handle the strict x86 I$ coherency
requirements), your "if you're lucky" is completely bogus. In fact,
you'd be the _pessimal_ cache behaviour for something like that, ie you
get lines that ping-pong between the L2 and the two instruction caches.

Don't do it. Keep data and code on separate pages.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at
Please read the FAQ at

This archive was generated by hypermail 2b29 : Sat Feb 15 2003 - 22:00:38 EST