Re: bio linked list corruption.

From: Linus Torvalds
Date: Mon Dec 05 2016 - 12:55:17 EST


On Mon, Dec 5, 2016 at 9:09 AM, Vegard Nossum <vegard.nossum@xxxxxxxxx> wrote:
>
> The warning shows that it made it past the list_empty_careful() check
> in finish_wait() but then bugs out on the &wait->task_list
> dereference.
>
> Anything stick out?

I hate that shmem waitqueue garbage. It's really subtle.

I think the problem is that "wake_up_all()" in shmem_fallocate()
doesn't necessarily wake up everything. It wakes up TASK_NORMAL -
which does include TASK_UNINTERRUPTIBLE, but doesn't actually mean
"everything on the list".

I think that what happens is that the waiters somehow move from
TASK_UNINTERRUPTIBLE to TASK_RUNNING early, and this means that
wake_up_all() will ignore them, leave them on the list, and now that
list on stack is no longer empty at the end.

And the way *THAT* can happen is that the task is on some *other*
waitqueue as well, and that other waiqueue wakes it up. That's not
impossible, you can certainly have people on wait-queues that still
take faults.

Or somebody just uses a directed wake_up_process() or something.

Since you apparently can recreate this fairly easily, how about trying
this stupid patch?

NOTE! This is entirely untested. I may have screwed this up entirely.
You get the idea, though - just remove the wait queue head from the
list - the list entries stay around, but nothing points to the stack
entry (that we're going to free) any more.

And add the warning to see if this actually ever triggers (and because
I'd like to see the callchain when it does, to see if it's another
waitqueue somewhere or what..)

Linus
diff --git a/mm/shmem.c b/mm/shmem.c
index 166ebf5d2bce..a80148b43476 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2665,6 +2665,8 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset,
spin_lock(&inode->i_lock);
inode->i_private = NULL;
wake_up_all(&shmem_falloc_waitq);
+ if (WARN_ON_ONCE(!list_empty(&shmem_falloc_waitq.task_list)))
+ list_del(&shmem_falloc_waitq.task_list);
spin_unlock(&inode->i_lock);
error = 0;
goto out;