Re: Race conditions galore (2.0.33 and possibly 2.1.x)

Linus Torvalds (torvalds@transmeta.com)
22 Dec 1997 17:02:54 GMT


In article <19971222151656.32023@cuci.nl>,
Stephen R. van den Berg <srb@cuci.nl> wrote:
>Could someone who created these wait-queue loops (there are several of them
>in the kernel, potentially all of them are race conditions waiting to
>happen) confirm if I'm reading this correctly? Maybe I still overlooked
>possible problems, but it seems like this is the minimally-correct change.

No, your patch should be unnecessary as far as I can see.

Anyway, the basic race-free wait loop looks like this (there are
variations, but this is one of the basic versions that you find in
various places):

if (should_wait_condition) {
add_wait_queue(..);
repeat:
current->state = TASK_UNINTERRUPTIBLE;
if (should_wait_condition) {
schedule();
goto repeat;
}
remove_wait_queue(..);
current->state = TASK_RUNNING;
}

There are only two important rules:
- you have to add yourself to the wait queue _before_ testing for the
condition.
- you have to mark yourself asleep _before_ testing for the condition.

The rest is essentially just "fluff". For example, above we test for
the condition twice: first we have the common case where the condition
isn't true, and for that case we don't even bother with the wait-queues
at all because we know we won't be sleeping. Then we add ourselves to
the wait-queue, but because of the two rules above we now have to
re-test the condition.

The above may look complex, but it's really fairly straight-forward, and
the really important thing is that you don't have to do any locking at
all (well, the wait queue manipulation and the scheduling will do their
own locking, but basically nothing else needs to do so).

Anyway, the reason the two simple rules are needed is that _if_ you mark
yourself asleep _and_ have added yourself to the wait queue before
testing for any wake-up condition, then you know that if something
happens to change that condition (and does a "wake_up()", then it will
also have marked you as being awake (and thus the "schedule()" will not
sleep, and we'll catch the condition the second time around).

Now, admittedly the above does depend on a few ordering constraints, and
those ordering constraints are not necessarily true in all SMP systems.
Maybe this was what you alluded to? The rules in question are really
only valid on something that guarantees fairly strict memory ordering
(UP in particular as the obvious case), and a memory barrier might be a
good idea for a truly SMP-safe version. The sleeping scheme was really
designed for UP, I have to admit.

Could you explain what kind of race you were thinking of?

Linus