Re: [this_cpu_xx V7 0/8] Per cpu atomics in core allocators andcleanup

From: Mathieu Desnoyers
Date: Thu Dec 17 2009 - 15:26:36 EST

* Christoph Lameter (cl@xxxxxxxxxxxxxxxxxxxx) wrote:
> > However, I would need:
> >
> > this_cpu_cmpxchg(scalar, oldv, newv)
> > (maps to x86 cmpxchg)
> >
> > this_cpu_add_return(scalar, value)
> > (maps to x86 xadd)
> >
> > too. Is that a planned addition ?
> It was not necessary. Its easy to add though.
> > (while we are at it, we might as will add the xchg instruction,
> > althrough it has an implied LOCK prefix on x86).
> Well yeah thats a thorny one. One could use the cmpxchg instead?

Yes, although maybe it would make sense to encapsulate it in a xchg
primitive anyway, in case some architecture has a better xchg than x86.
For instance, powerpc, with its linked load/store conditional, can skip
a comparison for xchg that's otherwise required for cmpxchg.

Some quick test on my Intel Xeon E5405:

local cmpxchg: 14 cycles
xchg: 18 cycles

So yes, indeed, the non-LOCK prefixed local cmpxchg seems a bit faster
than the xchg, given the latter has an implied LOCK prefix.

Code used for local cmpxchg:
old = var;
do {
ret = cmpxchg_local(&var, old, 4);
if (likely(ret == old))
old = ret;
} while (1);



