Re: [PATCH 5/5] nvme: use __blk_mq_complete_request in timeout path

From: jianchao.wang
Date: Wed Jun 20 2018 - 22:09:21 EST


Hi Christoph

Thanks for your kindly response.

On 06/20/2018 10:39 PM, Christoph Hellwig wrote:
>> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
>> index 73a97fc..2a161f6 100644
>> --- a/drivers/nvme/host/pci.c
>> +++ b/drivers/nvme/host/pci.c
>> @@ -1203,6 +1203,7 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
>> nvme_warn_reset(dev, csts);
>> nvme_dev_disable(dev, false);
>> nvme_reset_ctrl(&dev->ctrl);
>> + __blk_mq_complete_request(req);
>> return BLK_EH_DONE;
>> }
>>
>> @@ -1213,6 +1214,11 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
>> dev_warn(dev->ctrl.device,
>> "I/O %d QID %d timeout, completion polled\n",
>> req->tag, nvmeq->qid);
>> + /*
>> + * nvme_end_request will invoke blk_mq_complete_request,
>> + * it will do nothing for this timed out request.
>> + */
>> + __blk_mq_complete_request(req);
>
> And this clearly is bogus. We want to iterate over the tagetset
> and cancel all requests, not do that manually here.
>
> That was the whole point of the original change.
>

For nvme-pci, we indeed have an issue that when nvme_reset_work->nvme_dev_disable returns, timeout path maybe still
running and the nvme_dev_disable invoked by timeout path will race with the nvme_reset_work.
However, the hole is still there right now w/o my changes, but just narrower.

Thanks
Jianchao