Re: [patch][rfc] x86, mutex: non-atomic unlock (and a rant)

From: Cyrill Gorcunov
Date: Mon Nov 02 2009 - 11:46:36 EST


[Linus Torvalds - Mon, Nov 02, 2009 at 07:20:08AM -0800]
|
| On Mon, 2 Nov 2009, Nick Piggin wrote:
| >
| > Non-atomic unlock for mutexs maybe? I do this by relying on cache
| > coherence on a cacheline basis for ordering rather than the memory
| > consistency of the x86. Linus I know you've told me this is an incorrect
| > assumption in the past, but I'm not so sure.
|
| I'm sure.
|
| This is simply buggy:
|
| > + atomic_set(&lock->count, 1);
| > + barrier();
| > + if (unlikely(lock->waiters))
| > + fail_fn(lock);
|
| because it doesn't matter one whit whether 'lock->count' and
| 'lock->waiters' are in the same cacheline or not.
|
| The cache coherency deals in cachelines, but the instruction re-ordering
| logic does not. It's entirely possible that the CPU will turn this into
|
| tmp = lock->waiters;
| ...
| atomic_set(&lock->count, 1);
| if (tmp)
| fail_fn(lock);
|
| and your "barrier()" did absolutely nothing.
...

If we write it as

atomic_set(&lock->count, 1);
some-serializing-op(); /* say cpuid() */
if (unlikely(lock->waiters))
fail_fn(lock);

This should do the trick, though this serializing operation
is always cost too much.

The other option could be that we put two mem-write operations
like
int tmp;
atomic_set(&lock->count, 1);
tmp = lock->waiters;
rmb();
lock->waiters = tmp;
if (unlikely(lock->waiters))
fail_fn(lock);

Which should work faster then cpuid (and we have to be sure somehow
that gcc doesn't suppress this redundant operations).

-- Cyrill
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/