Re: + kthread-add-a-missing-memory-barrier-to-kthread_stop.patchadded to -mm tree

From: Linus Torvalds
Date: Sat Feb 23 2008 - 14:00:01 EST




On Sat, 23 Feb 2008, Oleg Nesterov wrote:
>
> Yes, but still I suspect wmb() is not enough. Note that try_to_wake_up()
> first checks (reads) the task->state,
>
> if (!(old_state & state))
> goto out;
>
> without the full mb() it is (in theory) possible that try_to_wake_up()
> first reads TASK_RUNNING and only then sets CONDITION. IOW, STORE and
> LOAD could be re-ordered.

No. The spinlock can have preceding stores (and loads, for that matter)
percolate *into* the locked region, but a spinlock can *not* have loads
(and stores) escape *out* of the region withou being totally broken.

So the spinlock that predeces that load of "old_state" absolutely *must*
be a memory barrier as far as that load is concerned.

Spinlocks are just "one-way" permeable. They stop both reads and writes,
but they stop them from escaping up to earlier code (and the spin-unlock
in turn stops reads and writes within the critical section from escaping
down past the unlock).

But they definitely must stop that direction, and no loads or stores that
are inside the critical section can possibly be allowed to be done outside
of it, or the spinlock would be pointless.

[ Yeah, yeah, I guess that in theory you could serialize spinlocks only
against each other, and their serialization would be in a totally
different "domain" from normal reads and writes, and then the spinlock
wouldn't act as a barrier at all except for stuff that is literally
inside another spinlock, but I don't think anybody really does that ]

So I think a simple smp_wmb() should be ok.

That said, I do not *mind* the notion of "smp_mb_before/after_spinlock()",
and just have architectures do whatever they want (with x86 just being a
no-op). I don't think it's wrong in any way, and may be the really
generic solution for some theoretical case where locking is done not by
using the normal cache coherency, but over a separate lock network. But I
would suggest against adding new abstractions unless there are real cases
of it mattering.

(The biggest reason to do it in practice is probably architectures that
have a really slow smp_wmb() and that also have a spinlock that is already
serializing enough, but if this is only for one single use - in
try_to_wake_up() - even that doesn't really seem to be a very strong
argument).

> A bit offtopic, but let's take another example, __queue_work()->insert_work().
> With some trivial modification we can move set_wq_data() outside of cwq->lock.
> But according to linux's memory model we still need wmb(), which is a bit
> annoying. Perhaps we can also add smp_wmb_before_spinlock(). Not sure this
> is not too ugly though.

Is that really so performance-critical? We still need the spinlock for the
actual list manipulation, so why would we care?

And the smp_wmb() really should be free. It's literally free on x86, but
doing a write barrier is generally a cheap operation on any sane
architecture (ie generally cheaper than a read barrier or a full barrier:
it should mostly be a matter of inserting a magic entry in the write
buffers).

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/