Re: + kthread-add-a-missing-memory-barrier-to-kthread_stop.patchadded to -mm tree

From: Linus Torvalds
Date: Sat Feb 23 2008 - 12:56:06 EST




On Sat, 23 Feb 2008, Oleg Nesterov wrote:
>
> In short: wake_up_process() doesn't imply mb(), this means that _in theory_
> the commonly used code like
>
> set_current_state(TASK_INTERRUPTIBLE);
> if (CONDITION)
> return;
> schedule();
>
> is racy wrt
>
> CONDITION = 1;
> wake_up_process(p);
>
> I'll be happy to be wrong though, please correct me.

Well, you should be wrong on x86, because the spinlock at the head of
wake_up_process() (well, "try_to_wake_up()" to be exact) will be a full
memory barrier.

But yeah, in general spinlocks can have weaker semantics, and let
preceding writes percolate into the critical section and thus past the
point that actually sets task->state.

And I do agree that we should *not* add a memory barrier in the caller
(that's just going to be really confusing for everybody, and make these
things much harder than they should be), and we should make sure that the
above sequence is always race-free.

I also think that a full memory barrier is overkill. We should be ok with
just adding a write barrier to the top of wake_up_process(), no? That way,
we know that the CONDITION will be seen on the other CPU before it sees
task->state change to TASK_RUNNING, so with the memory barrier int he
"set_current_state()", we know that the other side will see the new
condition _or_ it will see TASK_RUNNING when it hits schedule.

No?

(smp_wmb() also has the advantage of being a no-op on x86, where it's not
needed).

But memory ordering is hard. Maybe I'm not thinking right about this.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/