Re: NULL pointer dereference at blk_drain_queue

From: Asias He
Date: Thu Jun 14 2012 - 09:18:51 EST


On 06/14/2012 05:42 PM, Jiri Slaby wrote:
On 06/14/2012 11:16 AM, Jens Axboe wrote:
On 06/14/2012 11:04 AM, Jiri Slaby wrote:
Hi,

with today's -next I'm (reproducibly) getting this while updating packages:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff8108cd16>] __wake_up_common+0x26/0x90
PGD 463f1067 PUD 463f2067 PMD 0
Oops: 0000 [#1] SMP
CPU 1
Modules linked in:
Pid: 2711, comm: kworker/1:0 Not tainted 3.5.0-rc2-next-20120614_64+
#1752 Bochs Bochs
RIP: 0010:[<ffffffff8108cd16>] [<ffffffff8108cd16>]
__wake_up_common+0x26/0x90
RSP: 0018:ffff880047221cb0 EFLAGS: 00010082
RAX: 0000000000000086 RBX: ffff880046350888 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff880046350888
RBP: ffff880047221cf0 R08: 0000000000000000 R09: 00000001000c0009
R10: ffff880047804480 R11: 0000000000000000 R12: ffff880046350890
R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000003
FS: 0000000000000000(0000) GS:ffff880049700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000045ced000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/1:0 (pid: 2711, threadinfo ffff880047220000, task
ffff8800435bc5c0)
Stack:
000000004628da68 0000000000000000 ffff88004970d340 ffff880046350888
0000000000000086 0000000000000003 0000000000000000 0000000000000000
ffff880047221d30 ffffffff8108d9a3 ffff88004970d340 ffff880046350848
Call Trace:
[<ffffffff8108d9a3>] __wake_up+0x43/0x70
[<ffffffff81267f96>] blk_drain_queue+0xf6/0x120
[<ffffffff8126803f>] blk_cleanup_queue+0x7f/0xd0
[<ffffffff814a9a80>] md_free+0x50/0x70
[<ffffffff8127b3c2>] kobject_cleanup+0x82/0x1b0
[<ffffffff8127b24b>] kobject_put+0x2b/0x60
[<ffffffff814a97ef>] mddev_delayed_delete+0x2f/0x40
[<ffffffff8107e1ab>] process_one_work+0x11b/0x3f0
[<ffffffff814a97c0>] ? restart_array+0xc0/0xc0
[<ffffffff8107f94e>] worker_thread+0x12e/0x340
[<ffffffff8107f820>] ? manage_workers.isra.29+0x1f0/0x1f0
[<ffffffff81084e1e>] kthread+0x8e/0xa0
[<ffffffff8160add4>] kernel_thread_helper+0x4/0x10
[<ffffffff81084d90>] ? flush_kthread_worker+0x70/0x70
[<ffffffff8160add0>] ? gs_change+0xb/0xb
Code: 80 00 00 00 00 55 48 89 e5 41 57 41 89 f7 41 56 41 89 ce 41 55 41
54 4c 8d 67 08 53 48 83 ec 18 89 55 c4 48 8b 57 08 4c 89 45 c8 <4c> 8b
2a 48 8d 42 e8 49 83 ed 18 49 39 d4 75 0d eb 40 0f 1f 84

It's a bug in local commit bc85cf83, for stacked devices we have not
initialized the wait queues. So the below should fix it, as would always
initializing all queue structures even for the partial use case.


diff --git a/block/blk-core.c b/block/blk-core.c
index b477fa0..93eb3e4 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -415,10 +415,12 @@ void blk_drain_queue(struct request_queue *q, bool drain_all)
* allocation path, so the wakeup chaining is lost and we're
* left with hung waiters. We need to wake up those waiters.
*/
- spin_lock_irq(q->queue_lock);
- for (i = 0; i < ARRAY_SIZE(q->rq.wait); i++)
- wake_up_all(&q->rq.wait[i]);
- spin_unlock_irq(q->queue_lock);
+ if (q->request_fn) {
+ spin_lock_irq(q->queue_lock);
+ for (i = 0; i < ARRAY_SIZE(q->rq.wait); i++)
+ wake_up_all(&q->rq.wait[i]);
+ spin_unlock_irq(q->queue_lock);
+ }

Yes, that fixed it.

Jiri, good to hear this fixes for you. BTW. How do you trigger this issue?

Jens, do you prefer to fix it up in your tree yourself or wait a patch from me?

--
Asias


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/