Re: [PATCH v1 1/2] scsi: ufs: Fix unbalanced scsi_block_reqs_cnt caused by ufshcd_hold()

From: Can Guo
Date: Tue Nov 03 2020 - 08:28:31 EST


On 2020-11-03 15:07, Stanley Chu wrote:
Hi Can,

On Mon, 2020-11-02 at 22:24 -0800, Can Guo wrote:
The scsi_block_reqs_cnt increased in ufshcd_hold() is supposed to be
decreased back in ufshcd_ungate_work() in a paired way. However, if
specific ufshcd_hold/release sequences are met, it is possible that
scsi_block_reqs_cnt is increased twice but only one ungate work is
queued. To make sure scsi_block_reqs_cnt is handled by ufshcd_hold() and

Just curious that how could this be possible? Would you have some failed
examples?


[1] One gate_work() is in the workqueue, not yet executed, now clk state == REQ_CLKS_OFF.
[2] ufshcd_queuecommand() calls ufshcd_hold(async == ture) -> active_req++ -> scsi_block_reqs_cnt++ -> REQ_CLKS_ON -> queue ungate work -> active_req-- -> return -EAGAIN.
[3] Now gate_work() starts to run, but since the clk state is REQ_CLKS_ON, gate_work() just sets clk state to CLKS_ON and bail.
[3] Someone calls ufshcd_hold(async == false) -> do something -> ufshcd_release() -> clk state is changed to REQ_CLKS_OFF. Note that, till now, ungate_work() is still in the work queue, not executed yet.
[4] Now, if someone calls ufshcd_hold(), we will hit the issue.

Above sequence is a very common clk gate/ungate sequence. The issue
is because ungate_work is queued but cannot be executed in time. In my
case, I see the ungate_work is somehow delayed for about 150ms. This
change has been tested by customers on multiple platforms. And you
can tell from the code that it won't break anything. :)

Thanks,

Can Guo.

ufshcd_ungate_work() in a paired way, increase it only if queue_work()
returns true.

Signed-off-by: Can Guo <cang@xxxxxxxxxxxxxx>
Reviewed-by: Hongwu Su <hongwus@xxxxxxxxxxxxxx>
---
drivers/scsi/ufs/ufshcd.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 847f355..efa7d86 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -1634,12 +1634,12 @@ int ufshcd_hold(struct ufs_hba *hba, bool async)
*/
/* fallthrough */
case CLKS_OFF:
- ufshcd_scsi_block_requests(hba);
hba->clk_gating.state = REQ_CLKS_ON;
trace_ufshcd_clk_gating(dev_name(hba->dev),
hba->clk_gating.state);
- queue_work(hba->clk_gating.clk_gating_workq,
- &hba->clk_gating.ungate_work);
+ if (queue_work(hba->clk_gating.clk_gating_workq,
+ &hba->clk_gating.ungate_work))
+ ufshcd_scsi_block_requests(hba);
/*
* fall through to check if we should wait for this
* work to be done or not.

Thanks,
Stanley Chu