Re: [PATCH 3/4] blk-mq: establish new mapping before cpu starts handling requests

From: Akinobu Mita
Date: Thu Jun 25 2015 - 08:49:51 EST


2015-06-25 17:07 GMT+09:00 Ming Lei <tom.leiming@xxxxxxxxx>:
> On Thu, Jun 25, 2015 at 10:56 AM, Akinobu Mita <akinobu.mita@xxxxxxxxx> wrote:
>> 2015-06-25 1:24 GMT+09:00 Ming Lei <tom.leiming@xxxxxxxxx>:
>>> On Wed, Jun 24, 2015 at 10:34 PM, Akinobu Mita <akinobu.mita@xxxxxxxxx> wrote:
>>>> Hi Ming,
>>>>
>>>> 2015-06-24 18:46 GMT+09:00 Ming Lei <tom.leiming@xxxxxxxxx>:
>>>>> On Sun, Jun 21, 2015 at 9:52 PM, Akinobu Mita <akinobu.mita@xxxxxxxxx> wrote:
>>>>>> ctx->index_hw is zero for the CPUs which have never been onlined since
>>>>>> the block queue was initialized. If one of those CPUs is hotadded and
>>>>>> starts handling request before new mappings are established, pending
>>>>>
>>>>> Could you explain a bit what the handling request is? The fact is that
>>>>> blk_mq_queue_reinit() is run after all queues are put into freezing.
>>>>
>>>> Notifier callbacks for CPU_ONLINE action can be run on the other CPU
>>>> than the CPU which was just onlined. So it is possible for the
>>>> process running on the just onlined CPU to insert request and run
>>>> hw queue before blk_mq_queue_reinit_notify() is actually called with
>>>> action=CPU_ONLINE.
>>>
>>> You are right because blk_mq_queue_reinit_notify() is alwasy run after
>>> the CPU becomes UP, so there is a tiny window in which the CPU is up
>>> but the mapping is updated. Per current design, the CPU just onlined
>>> is still mapped to hw queue 0 until the mapping is updated by
>>> blk_mq_queue_reinit_notify().
>>>
>>> But I am wondering why it is a problem and why you think flush_busy_ctxs
>>> can't find the requests on the software queue in this situation?
>>
>> The problem happens when the CPU has just been onlined first time
>> since the request queue was initialized. At this time ctx->index_hw
>> for the CPU is still zero before blk_mq_queue_reinit_notify is called.
>>
>> The request can be inserted to ctx->rq_list, but blk_mq_hctx_mark_pending()
>> marks busy for wrong bit position as ctx->index_hw is zero.
>
> It isn't wrong bit since the CPU onlined just is still mapped to hctx 0 at that
> time .

ctx->index_hw is not CPU queue to HW queue mapping.
ctx->index_hw is the index in hctx->ctxs[] for this ctx.
Each ctx in a hw queue should have unique ctx->index_hw.

This problem can be reproducible with a single hw queue. (The script
in cover letter can reproduce this problem with a single hw queue)

>> flush_busy_ctxs() only retrieves the requests from software queues
>> which are marked busy. So the request just inserted is ignored as
>> the corresponding bit position is not busy.
>
> Before making the remap in blk_mq_queue_reinit() for the CPU topo change,
> the request queue will be put into freezing first and all requests
> inserted to hctx 0
> should be retrieved and scheduled out. So can the request be igonred by
> flush_busy_ctxs()?

For example, there is a single hw queue (hctx) and two CPU queues
(ctx0 for CPU0, and ctx1 for CPU1). Now CPU1 is just onlined and
a request is inserted into ctx1->rq_list and set bit0 in pending
bitmap as ctx1->index_hw is still zero.

And then while running hw queue, flush_busy_ctxs() finds bit0 is set
in pending bitmap and tries to retrieve requests in
hctx->ctxs[0].rq_list. But htx->ctxs[0] is ctx0, so the request in
ctx1->rq_list is ignored.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/