Re: bio linked list corruption.

From: Linus Torvalds
Date: Tue Dec 06 2016 - 11:53:18 EST


On Tue, Dec 6, 2016 at 12:16 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>>
>> Of course, I'm really hoping that this shmem.c use is the _only_ such
>> case. But I doubt it.
>
> $ git grep DECLARE_WAIT_QUEUE_HEAD_ONSTACK | wc -l
> 28

Hmm. Most of them seem to be ok, because they use "wait_event()",
which will always remove itself from the wait-queue. And they do it
from the place that allocates the wait-queue.

IOW, the mm/shmem.c case really was fairly special, because it just
did "prepare_to_wait()", and then did a finish_wait() - and not in the
thread that allocated it on the stack.

So it's really that "some _other_ thread allocated the waitqueue on
the stack, and now we're doing a wait on it" that is bad.

So the normal pattern seems to be:

- allocate wq on the stack
- pass it on to a waker
- wait for it

and that's ok, because as part of "wait for it" we will also be
cleaning things up.

The reason mm/shmem.c was buggy was that it did

- allocate wq on the stack
- pass it on to somebody else to wait for
- wake it up

and *that* is buggy, because it's the waiter, not the waker, that
normally cleans things up.

Is there some good way to find this kind of pattern automatically, I wonder....

Linus