Re: [RFC PATCH] x86 alternatives : fix LOCK_PREFIX race withpreemptible kernel and CPU hotplug

From: Linus Torvalds
Date: Thu Aug 14 2008 - 12:17:09 EST




On Thu, 14 Aug 2008, Mathieu Desnoyers wrote:
>
> I can't argue about the benefit of using VM CPU pinning to manage
> resources because I don't use it myself, but I ran some tests out of
> curiosity to find if uncontended locks were that cheap, and it turns out
> they aren't.

Absolutely.

Locked ops show up not just in microbenchmarks looping over the
instruction, they show up in "real" benchmarks too. We added a single
locked instruction (maybe it was two) to the page fault handling code some
time ago, and the reason I noticed it was that it actually made the page
fault cost visibly more expensive in lmbench. That was a _single_
instruction in the hot path (or maybe two).

And the page fault path is some of the most timing critical in the whole
kernel - if you have everything cached, the cost of doing the page faults
to populate new processes for some fork/exec-heavy workload (and compiling
the kernel is just one of those - any traditional unix behaviour will show
this) is critical.

This is one of the things AMD does a _lot_ better than Intel. Intel tends
to have a 30-50 cycle cost (with later P4s being *much* worse), while AMD
tends to have a cost of around 10-15 cycles.

It's one of the things Intel promises to have improved in the next-gen
uarch (Nehalem), an while I am not supposed to give out any benchmarks, I
can confirm that Intel is getting much better at it. But it's going to be
visible still, and it's really a _big_ issue on P4.

(Of course, on P4, the page fault exception cost itself is so high that
the cost of atomics may be _relatively_ less noticeable in that particular
path)

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/