Re: [patch][rfc] x86, mutex: non-atomic unlock (and a rant)

From: Nick Piggin
Date: Mon Nov 02 2009 - 11:00:37 EST

On Mon, Nov 02, 2009 at 07:20:08AM -0800, Linus Torvalds wrote:
> On Mon, 2 Nov 2009, Nick Piggin wrote:
> >
> > Non-atomic unlock for mutexs maybe? I do this by relying on cache
> > coherence on a cacheline basis for ordering rather than the memory
> > consistency of the x86. Linus I know you've told me this is an incorrect
> > assumption in the past, but I'm not so sure.
> I'm sure.
> This is simply buggy:
> > + atomic_set(&lock->count, 1);
> > + barrier();
> > + if (unlikely(lock->waiters))
> > + fail_fn(lock);
> because it doesn't matter one whit whether 'lock->count' and
> 'lock->waiters' are in the same cacheline or not.
> The cache coherency deals in cachelines, but the instruction re-ordering
> logic does not. It's entirely possible that the CPU will turn this into
> tmp = lock->waiters;
> ...
> atomic_set(&lock->count, 1);
> if (tmp)
> fail_fn(lock);
> and your "barrier()" did absolutely nothing.
> The fact that it may _work_ in almost all circumstances (and perhaps even
> "always" on some microarchitectures) is irrelevant. It's simply not
> guaranteed to work. Yes, you need just the right timings, and yes, it's
> probably hard to hit. And yes, I can well imagine that some micro-
> architecture will even guarantee the write->read ordering, and that it
> would _always_ work on that micro-architecture.
> But I can see your thing failing even on an in-order CPU. It literally
> doesn't even need OoO to fail, all it needs is a sufficiently deep write
> buffer on an in-order core. And to fail in practice, maybe there needs to
> be lots of writes in that buffer, and some bad luck, but the thing is,
> write buffers are not coherent between cores - so the write may have
> happened as far as the core that does it is concerned, but other cores
> (or even HT) may not see the new value until after the read has taken
> effect.

Hm OK I see you must be right there. The trick will only be guaranteed
to work if you operate on exactly the same memory location I guess (or
for store/store vs load/load sequences). In which case, atomic ops
can't be avoided for the unlock case :(

Well, it can use a barrier instead of atomic for unlock, which might
help on some architectures but on x86 I don't think it does much.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at