Re: [PATCH] blk-mq: Avoid that I/O hangs in bt_get()

From: Jens Axboe
Date: Mon Dec 08 2014 - 11:49:54 EST


On 12/08/2014 07:55 AM, Bart Van Assche wrote:
On 11/06/14 14:41, Bart Van Assche wrote:
With kernel 3.18-rc3 and with can_queue=62 I can trigger a hang in
bt_get() easily.

(once more replying to my own e-mail)

Hello Jens,

Finally I found the time to look further into this. The patch below
seems to be sufficient to prevent this hang. However, I'm not a block
layer expert so it's not clear to me whether the patch below makes sense ?

Thanks,

Bart.

[PATCH] blk-mq: Fix bt_get() hang

Avoid that if there are fewer hardware queues than CPU threads that
bt_get() can hang. The symptoms of the hang were as follows:
* All tags allocated for a particular hardware queue.
* (nr_tags) pending commands for that hardware queue.
* No pending commands for the software queues associated with that
hardware queue.
---
block/blk-mq-tag.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 67ab88b..e88af88 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -256,6 +256,8 @@ static int bt_get(struct blk_mq_alloc_data *data,
break;
}

+ blk_mq_run_hw_queue(hctx, false);
+
blk_mq_put_ctx(data->ctx);

io_schedule();

Ah yes, that could be an issue for some cases, we do need to run the queue there. For a tag map shared across hardware queues, we might need to run more than just the current queue, however. For now we can safely assume that we allocate fairly, so it should not be an issue.

It might be worth experimenting with doing a __bt_get() after the queue run before going to sleep, in case the queue run found completions as well.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/