Re: Race conditions galore (2.0.33 and possibly 2.1.x)

Linus Torvalds (torvalds@transmeta.com)
22 Dec 1997 22:04:20 GMT


In article <19971222214619.45376@cuci.nl>,
Stephen R. van den Berg <srb@cuci.nl> wrote:
>
>But, to get back to the matter at hand:
>1. I had a 2.0.33+ kernel, programs where hanging at times; all
> in buffer.c. Rebooting did not help.
>2. I patched buffer.c with the shown patch. Rebooted and this time
> *one* program hung again. This time in filemap.c.
>3. Patched filemap.c the same as buffer.c.
>4. Rebooted, hang-free for 19 hours up till now and still counting.
>
>No other changes. Nor in the kernel, nor in userland.

Clear enough. However, the change in question still is only an ordering
change, and one that should not make any difference what-so-ever (it
re-orders the setting of "current->state" only across things that don't
actually _care_ about the state). So we have it pinpointed to a (very
small) function, but I still don't like the fact that we haven't
pinpointed a cause..

The fact that this happens on an AMD K5 makes me wonder - does anybody
have an errata sheet for the AMD chips? There might be out-of-order
issues or something else strange going on that is K5-specific.

It's not that I don't want to apply your patch - the patch is no worse
than the current code. But I _do_ want to know why it makes any
difference for you, when it so patently shouldn't make any difference at
all..

>BTW, I checked the hung processes with kdebug. I did find that the
>buffer they where hanging on was:
>1. *NOT* locked anymore. The buffer state was 0x9, the locked mask is 0x2
> I believe.

Actually, it's bit #2, ie 0x4. But you're right: 0x9 is certainly not
locked (it's "Uptodate+Touched", which is pretty much normal).

>2. We were still in its wait queue (first on the list), but we were not the
> only one. If I recall correctly there was at least one other entry,
> if not more (I did not know yet at that point that the wait_queues
> where circular).

Maybe the bug is that something marked the buffers as not locked without
waking anything up? Then your change in ordering might make a
difference, if the buffer has been touched multiple times.

>Hmmm... Just thinking aloud here, so bear with me...
>Looking at wake_up_process() I see that it indeed writes over p->state
>in any case. What's p->next_run? When is that set?
>Could it be that p->next_run is set and about to be cleared without
>adding the task to the runqueue (in a different part of the kernel)?

No, if we forgot to add the process to the run-queue, it would still
have been marked as TASK_RUNNABLE - even though it would never have been
actually run. And you said that the stuck processes are always stuck in
disk wait according to "ps"... So wake_up_process() was never called at
all.

>One more thing... What if schedule() is being invoked with
>current->state == TASK_RUNNING ?

This is normal, and happens all the time. In particular, it happens
after any process has used up its time-slice..

>What if someone changes current->state while schedule() is running
>(that *should* not be possible, considering that most of it runs
>with interrupts disabled)?

Nothing but the task itself can ever set "state" to anything but
runnable (ie you can only wake a task up from another task, you can
never try to force it to sleep).

>>I wonder whether gcc might be changing the ordering here. That would
>>explain why your patch would make a difference..
>
>Possibly, but not in the parts where I patched. Maybe there are parts
>in sched.c which are being reordered?

The scheduling is safe - I've looked at that code after the compiler has
munged it often enough, and it's been looked at by lots of people.

>>In fact, the more I think about it, the more your patch makes sense
>>considering what gcc might do to the sources. Do you by any chance use
>>egcs or a recent snapshot of gcc?
>
>Sorry, no. gcc version 2.7.2.1 (from a recent Debian distribution).

Ok, 2.7.2.1 definitely doesn't do those kinds of re-ordering
optimizations (it would require memory alias analysis for correct code
generation, something that 2.7.x won't do). Hmm.. That leaves me with
two possibilities:
- something clears the locked state without waking people up. Do you
use "md" or anything else that plays around with buffers?
- really strange K5 bug

Can anybody else come up with any ideas?

Linus