Re: [PATCH V3] block: Mitigate lock unbalance caused by lockswitching

From: Tejun Heo
Date: Mon May 28 2012 - 21:41:32 EST


On Tue, May 29, 2012 at 09:39:01AM +0800, Asias He wrote:
> Commit 777eb1bf15b8532c396821774bf6451e563438f5 disconnects externally
> supplied queue_lock before blk_drain_queue(). Switching the lock would
> introduce lock unbalance because theads which have taken the external
> lock might unlock the internal lock in the during the queue drain. This
> patch mitigate this by disconnecting the lock after the queue draining
> since queue draining makes a lot of request_queue users go away.
>
> However, please note, this patch only makes the problem less likely to
> happen. Anyone who still holds a ref might try to issue a new request on
> a dead queue after the blk_cleanup_queue() finishes draining, the lock
> unbalance might still happen in this case.
>
> =====================================
> [ BUG: bad unlock balance detected! ]
> 3.4.0+ #288 Not tainted
> -------------------------------------
> fio/17706 is trying to release lock (&(&q->__queue_lock)->rlock) at:
> [<ffffffff81329372>] blk_queue_bio+0x2a2/0x380
> but there are no more locks to release!
>
> other info that might help us debug this:
> 1 lock held by fio/17706:
> #0: (&(&vblk->lock)->rlock){......}, at: [<ffffffff81327f1a>]
> get_request_wait+0x19a/0x250
>
> stack backtrace:
> Pid: 17706, comm: fio Not tainted 3.4.0+ #288
> Call Trace:
> [<ffffffff81329372>] ? blk_queue_bio+0x2a2/0x380
> [<ffffffff810dea49>] print_unlock_inbalance_bug+0xf9/0x100
> [<ffffffff810dfe4f>] lock_release_non_nested+0x1df/0x330
> [<ffffffff811dae24>] ? dio_bio_end_aio+0x34/0xc0
> [<ffffffff811d6935>] ? bio_check_pages_dirty+0x85/0xe0
> [<ffffffff811daea1>] ? dio_bio_end_aio+0xb1/0xc0
> [<ffffffff81329372>] ? blk_queue_bio+0x2a2/0x380
> [<ffffffff81329372>] ? blk_queue_bio+0x2a2/0x380
> [<ffffffff810e0079>] lock_release+0xd9/0x250
> [<ffffffff81a74553>] _raw_spin_unlock_irq+0x23/0x40
> [<ffffffff81329372>] blk_queue_bio+0x2a2/0x380
> [<ffffffff81328faa>] generic_make_request+0xca/0x100
> [<ffffffff81329056>] submit_bio+0x76/0xf0
> [<ffffffff8115470c>] ? set_page_dirty_lock+0x3c/0x60
> [<ffffffff811d69e1>] ? bio_set_pages_dirty+0x51/0x70
> [<ffffffff811dd1a8>] do_blockdev_direct_IO+0xbf8/0xee0
> [<ffffffff811d8620>] ? blkdev_get_block+0x80/0x80
> [<ffffffff811dd4e5>] __blockdev_direct_IO+0x55/0x60
> [<ffffffff811d8620>] ? blkdev_get_block+0x80/0x80
> [<ffffffff811d92e7>] blkdev_direct_IO+0x57/0x60
> [<ffffffff811d8620>] ? blkdev_get_block+0x80/0x80
> [<ffffffff8114c6ae>] generic_file_aio_read+0x70e/0x760
> [<ffffffff810df7c5>] ? __lock_acquire+0x215/0x5a0
> [<ffffffff811e9924>] ? aio_run_iocb+0x54/0x1a0
> [<ffffffff8114bfa0>] ? grab_cache_page_nowait+0xc0/0xc0
> [<ffffffff811e82cc>] aio_rw_vect_retry+0x7c/0x1e0
> [<ffffffff811e8250>] ? aio_fsync+0x30/0x30
> [<ffffffff811e9936>] aio_run_iocb+0x66/0x1a0
> [<ffffffff811ea9b0>] do_io_submit+0x6f0/0xb80
> [<ffffffff8134de2e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [<ffffffff811eae50>] sys_io_submit+0x10/0x20
> [<ffffffff81a7c9e9>] system_call_fastpath+0x16/0x1b
>
> Changes since v2: Update commit log to explain how the code is still
> broken even if we delay the lock switching after the drain.
> Changes since v1: Update commit log as Tejun suggested.
>
> Signed-off-by: Asias He <asias@xxxxxxxxxx>

Acked-by: Tejun Heo <tj@xxxxxxxxxx>

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/