Re: Strange block/scsi/workqueue issue

From: Tejun Heo
Date: Mon Apr 11 2011 - 20:14:39 EST


Hello,

On Mon, Apr 11, 2011 at 06:52:10PM +0100, Steven Whitehouse wrote:
> WARNING: at lib/kref.c:34 kref_get+0x2d/0x30()
> Hardware name: PowerEdge R710
> Modules linked in:
> Pid: 12, comm: kworker/2:0 Not tainted 2.6.39-rc2+ #188
> Call Trace:
> [<ffffffff8108fa9a>] warn_slowpath_common+0x7a/0xb0
> [<ffffffff8108fae5>] warn_slowpath_null+0x15/0x20
> [<ffffffff813c97cd>] kref_get+0x2d/0x30
> [<ffffffff813c81ca>] kobject_get+0x1a/0x30
> [<ffffffff814607f4>] get_device+0x14/0x20
> [<ffffffff81478b57>] scsi_request_fn+0x37/0x4a0
> [<ffffffff813aff2a>] __blk_run_queue+0x6a/0x110
> [<ffffffff813b1f66>] blk_delay_work+0x26/0x40
> [<ffffffff810aa9c7>] process_one_work+0x197/0x520
> [<ffffffff810acfec>] worker_thread+0x15c/0x330
> [<ffffffff810b1f16>] kthread+0xa6/0xb0
> [<ffffffff816870e4>] kernel_thread_helper+0x4/0x10
> ---[ end trace 3681e9da2630a94b ]---

Hmm, it could be that the root cause of the problem is
premature/double put of scsi_device. Without the patch, it makes
scsi_request_fn() call into device destruction path prematurely
triggering deadlock while after the patch, the deadlock is gone but
the ref count reaches zero prematurely triggering kref warning on the
next request.

The problem doesn't seem widespread so something about the setup is
peculiar. Steven, can you please detail the setup (and steps needed
to trigger the problem) and attach the full boot log? James, any
ideas?

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/