Re: [PATCH 3/4] blk-mq: establish new mapping before cpu starts handling requests

From: Akinobu Mita
Date: Mon Jun 29 2015 - 10:25:54 EST


2015-06-28 1:08 GMT+09:00 Ming Lei <tom.leiming@xxxxxxxxx>:
> On Sat, Jun 27, 2015 at 1:14 AM, Akinobu Mita <akinobu.mita@xxxxxxxxx> wrote:
>> Akinobu Mita <akinobu.mita@xxxxxxxxx> wrote:
>>> 2015-06-26 0:40 GMT+09:00 Ming Lei <tom.leiming@xxxxxxxxx>:
>>> > On Thu, 25 Jun 2015 21:49:43 +0900
>>> > Akinobu Mita <akinobu.mita@xxxxxxxxx> wrote:
>>> >> For example, there is a single hw queue (hctx) and two CPU queues
>>> >> (ctx0 for CPU0, and ctx1 for CPU1). Now CPU1 is just onlined and
>>> >> a request is inserted into ctx1->rq_list and set bit0 in pending
>>> >> bitmap as ctx1->index_hw is still zero.
>>> >>
>>> >> And then while running hw queue, flush_busy_ctxs() finds bit0 is set
>>> >> in pending bitmap and tries to retrieve requests in
>>> >> hctx->ctxs[0].rq_list. But htx->ctxs[0] is ctx0, so the request in
>>> >> ctx1->rq_list is ignored.
>>> >
>>> > Per current design, the request should have been inserted into ctx0 instead
>>> > of ctx1 because ctx1 isn't mapped yet even though ctx1->cpu becomes ONLINE.
>>> >
>>> > So how about the following patch? which looks much simpler.
>>>
>>> OK, I'll try this patch to see if the problem disappears.
>>
>> This doesn't fix the problem. Because:
>>
>>> > diff --git a/block/blk-mq.c b/block/blk-mq.c
>>> > index f537796..2f45b73 100644
>>> > --- a/block/blk-mq.c
>>> > +++ b/block/blk-mq.c
>>> > @@ -1034,7 +1034,12 @@ void blk_mq_insert_request(struct request *rq, bool at_head, bool run_queue,
>>> > struct blk_mq_ctx *ctx = rq->mq_ctx, *current_ctx;
>>> >
>>> > current_ctx = blk_mq_get_ctx(q);
>>> > - if (!cpu_online(ctx->cpu))
>>> > + /*
>>> > + * ctx->cpu may become ONLINE but ctx hasn't been mapped to
>>> > + * hctx yet because there is a tiny race window between
>>> > + * ctx->cpu ONLINE and doing the remap
>>> > + */
>>> > + if (!blk_mq_ctx_mapped(ctx))
>>> > rq->mq_ctx = ctx = current_ctx;
>>
>> The process running on just onlined CPU1 in the above example can
>> satisfy this condition and current_ctx will be ctx1. So the same
>> scenario can happen (the request is ignored by flush_busy_ctxs).
>
> Yeah, that is possible, and it should be bug of blk_mq_insert_request(),
> because the function supposes that the current ctx is mapped.
>
> Then I think the approach in your 1st email of this thread may be
> good one, and looks we have to make the remapping during
> CPU UP_PREPARE notifier.

OK, we can move on to other topic that you suggested in the other mail
thread.

>> I found simple alternative solution that assigns the offline CPUs
>> unique ctx->index_hw.
>
> This solution looks simpler, but it may not be correct.

You are right. This solution needs amendments, just in case we needs
to come back this solution instead of the first approach.

>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index 594eea0..a8fcfbf 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -1787,10 +1787,11 @@ static void blk_mq_map_swqueue(struct
>> request_queue *q)
>> */
>> queue_for_each_ctx(q, ctx, i) {
>> /* If the cpu isn't online, the cpu is mapped to first hctx */
>> - if (!cpu_online(i))
>> - continue;
>> + if (!cpu_online(i) && cpu_possible(i))
>
> The cpu_possible() check isn't needed.

OK.

>> + hctx = q->mq_ops->map_queue(q, 0);
>> + else
>> + hctx = q->mq_ops->map_queue(q, i);
>
> The above change supposes that all offline CPUs(ctxs)
> share the same 'hctx' mapped from CPU 0, and that is
> obvious wrong.
>
> All offline CPUs should have shared the 1st 'hctx' instead
> of the 'hctx' mapped from CPU 0.

Do you mean that we shoud use 'q->queue_hw_ctx[0]' for offline CPU?

>>
>> - hctx = q->mq_ops->map_queue(q, i);
>> cpumask_set_cpu(i, hctx->cpumask);
>
> CPU i shouldn't have been set on hctx->cpumask in this approach
> if it isn't online.

OK.

>> ctx->index_hw = hctx->nr_ctx;
>> hctx->ctxs[hctx->nr_ctx++] = ctx;
>
> I am not sure the current code is ready about adding offline 'ctx' into
> 'hctx', and there are some observalbe effects at least:
>
> - blk_mq_run_hw_queue() can run even the hctx hasn't mapped
> 'ctx'

Is this fixed by not setting hctx->cpumask for offline CPUs?

> - the offline ctx kobject will be exposed to user space in sysfs

OK. this should be avoided.

> - blk_mq_hw_queue_mapped() may becomes always true
> - set->tags[i](request entries) may not be freed even if there
> aren't mapped 'ctx' in one 'hctx'

Aren't these only happend for the 1st hctx (i.e. 'q->queue_hw_ctx[0]')?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/