Re: 2.5isms

From: Nick Piggin
Date: Sun Jan 02 2005 - 19:46:44 EST


Andi Kleen wrote:
Nick Piggin <nickpiggin@xxxxxxxxxxxx> writes:

even non HT CPUs possibly slightly more efficient WRT caching the stacks of
multiple processes?


Not on x86 no because they normally have physically indexed caches
(except for L1, but that is not really preserved over a context switch)
HT is just a special case because two threads essentially share cache.

In theory it could help on non x86 CPUs with virtually indexed caches,
but it is doubtful if they don't need more advanced forms of cache colouring.


That makes sense. I wonder if those architectures may just want to
implement it anyway. If this is such a win here, then it may be low
hanging fruit for those architectures.

But I guess there is something fundamentally a bit different when you
have two processes competing for L1 cache *at the same time*.


Second, on what workloads does performance suffer, can you remember? I wonder
if natural variations in the stack pointer as the program runs would mitigate
the effect of this on all but micro benchmarks?


iirc on lots of different workloas that run code on both virtual
CPUs at the same time. Without it you would get L1 cache thrashing,
which can slow things down quite a lot.

And yes it made a real difference. The P4 cache have some pecularities
("64K aliasing") that made the problem worse.


Interesting, thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/