Re: Race conditions galore (2.0.33 and possibly 2.1.x)

Stephen R. van den Berg (srb@cuci.nl)
Tue, 23 Dec 1997 03:16:28 +0100


Linus Torvalds wrote:
>Stephen R. van den Berg <srb@cuci.nl> wrote:
>>Could it be that p->next_run is set and about to be cleared without
>>adding the task to the runqueue (in a different part of the kernel)?

>No, if we forgot to add the process to the run-queue, it would still
>have been marked as TASK_RUNNABLE - even though it would never have been
>actually run. And you said that the stuck processes are always stuck in
>disk wait according to "ps"... So wake_up_process() was never called at
>all.

Well, they were certainly not marked as running or suspended, but I never
did say that they were marked as being in disk wait according to ps.
Actually, ps showed them in a state designated with a dot, like in:

PID TT STAT TIME
185 ? . 56:02 /usr/sbin/innd -p4 -r -i0 -c4 -L

I'm not sure where the dot comes from, or what it should designate.
(I'm using proc-ps as in the bo distribution of Debian).

I forgot to check the current->state from within kdebug, but that's
because current was not in the context (so gdb told me).

> - something clears the locked state without waking people up. Do you
> use "md" or anything else that plays around with buffers?

Which still makes me kind of wonder why my rearrangement fixes things.
The only behaviour changed here apparently is that *if*
during the execution of run_task_queue(&tq_disk) current->state is altered,
then we don't overwrite it before jumping into schedule().

> - really strange K5 bug

Which would be even more difficult to explain in the light of my
patch.

-- 
Sincerely,                                                          srb@cuci.nl
           Stephen R. van den Berg (AKA BuGless).

He did a quarter of the work in *half* the time!