[PATCH AUTOSEL 4.19 53/65] blk-mq: fix a hung issue when fsync

From: Sasha Levin
Date: Sat Feb 23 2019 - 16:23:21 EST


From: Jianchao Wang <jianchao.w.wang@xxxxxxxxxx>

[ Upstream commit 85bd6e61f34dffa8ec2dc75ff3c02ee7b2f1cbce ]

Florian reported a io hung issue when fsync(). It should be
triggered by following race condition.

data + post flush a flush

blk_flush_complete_seq
case REQ_FSEQ_DATA
blk_flush_queue_rq
issued to driver blk_mq_dispatch_rq_list
try to issue a flush req
failed due to NON-NCQ command
.queue_rq return BLK_STS_DEV_RESOURCE

request completion
req->end_io // doesn't check RESTART
mq_flush_data_end_io
case REQ_FSEQ_POSTFLUSH
blk_kick_flush
do nothing because previous flush
has not been completed
blk_mq_run_hw_queue
insert rq to hctx->dispatch
due to RESTART is still set, do nothing

To fix this, replace the blk_mq_run_hw_queue in mq_flush_data_end_io
with blk_mq_sched_restart to check and clear the RESTART flag.

Fixes: bd166ef1 (blk-mq-sched: add framework for MQ capable IO schedulers)
Reported-by: Florian Stecker <m19@xxxxxxxxxxxxxxxxx>
Tested-by: Florian Stecker <m19@xxxxxxxxxxxxxxxxx>
Signed-off-by: Jianchao Wang <jianchao.w.wang@xxxxxxxxxx>
Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>
---
block/blk-flush.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index ce41f666de3e1..76487948a27fa 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -424,7 +424,7 @@ static void mq_flush_data_end_io(struct request *rq, blk_status_t error)
blk_flush_complete_seq(rq, fq, REQ_FSEQ_DATA, error);
spin_unlock_irqrestore(&fq->mq_flush_lock, flags);

- blk_mq_run_hw_queue(hctx, true);
+ blk_mq_sched_restart(hctx);
}

/**
--
2.19.1