On 23/03/21 5:13 pm, Asutosh Das (asd) wrote:
On 3/22/2021 11:12 PM, Adrian Hunter wrote:
On 22/03/21 9:53 pm, Asutosh Das (asd) wrote:I'm referring to the pm_runtime_put(sdev_ufs_device) after all the links are setup, that you suggested to add.
On 3/19/2021 10:47 AM, Adrian Hunter wrote:
On 19/03/21 2:35 am, Asutosh Das wrote:
During runtime-suspend of ufs host, the scsi devices are
already suspended and so are the queues associated with them.
But the ufs host sends SSU to wlun during its runtime-suspend.
During the process blk_queue_enter checks if the queue is not in
suspended state. If so, it waits for the queue to resume, and never
comes out of it.
The commit
(d55d15a33: scsi: block: Do not accept any requests while suspended)
adds the check if the queue is in suspended state in blk_queue_enter().
Call trace:
__switch_to+0x174/0x2c4
__schedule+0x478/0x764
schedule+0x9c/0xe0
blk_queue_enter+0x158/0x228
blk_mq_alloc_request+0x40/0xa4
blk_get_request+0x2c/0x70
__scsi_execute+0x60/0x1c4
ufshcd_set_dev_pwr_mode+0x124/0x1e4
ufshcd_suspend+0x208/0x83c
ufshcd_runtime_suspend+0x40/0x154
ufshcd_pltfrm_runtime_suspend+0x14/0x20
pm_generic_runtime_suspend+0x28/0x3c
__rpm_callback+0x80/0x2a4
rpm_suspend+0x308/0x614
rpm_idle+0x158/0x228
pm_runtime_work+0x84/0xac
process_one_work+0x1f0/0x470
worker_thread+0x26c/0x4c8
kthread+0x13c/0x320
ret_from_fork+0x10/0x18
Fix this by registering ufs device wlun as a scsi driver and
registering it for block runtime-pm. Also make this as a
supplier for all other luns. That way, this device wlun
suspends after all the consumers and resumes after
hba resumes.
Co-developed-by: Can Guo <cang@xxxxxxxxxxxxxx>
Signed-off-by: Can Guo <cang@xxxxxxxxxxxxxx>
Signed-off-by: Asutosh Das <asutoshd@xxxxxxxxxxxxxx>
I have some more comments that may help straighten things out.
Also please look at ufs_debugfs_get_user_access() and
ufs_debugfs_put_user_access() that now need to scsi_autopm_get/put_device
sdev_ufs_device.
It would also be good if you could re-base on linux-next.
Hi Adrian
Thanks for the comments.
I agree moving the code to wlun probe and other changes.
But it looks to me that it may not fully solve the issue.
Please let me explain my understanding on this:
(Please refer to the logs in v10)
scsi_autopm_*() are invoked on a sdev.
pm_runtime_get_suppliers()/rpm_put_suppliers() are on the supplier device.
For the device wlun:
slave_configure():
- doesn't set the rpm_autosuspend
- pm_runtime_getnoresume()
scsi_sysfs_add_sdev():
- pm_runtime_forbid()
- scsi_autopm_get_device()
- device_add()
- ufshcd_wl_probe()
- scsi_autopm_put_device()
For all other scsi devices:
slave_alloc():
- ufshcd_setup_links()
Say all link_add: pm_runtime_put(&hba->sdev_ufs_device->sdev_gendev);
With DL_FLAG_RPM_ACTIVE, links will 'get' not 'put'
Ok
Correct. What'd stop the sd device from suspending?slave_configure():
- set rpm_autosuspend
scsi_sysfs_add_sdev():
- scsi_autopm_get_device()
- device_add() -> schedules an async probe()
- scsi_autopm_put_device() - (1)
Now the rpm_put_suppliers() can be invoked *after* pm_runtime_get_suppliers() of the async probe(), since both are running in different contexts.
Only if the sd device suspends.
We should be stopping the sd device from suspending here - imho.
You mean for performance reasons. That is something we canNot for performance reasons. I meant to say that this issue can be fixed if we stop the sd devices from suspending until the sd_probe() is completed.
look at, but let's get it working first.
I think the issue is in the sequence of events.
I'm referring to the logs that I pasted before:In that case, the usage_count of supplier would be decremented until rpm_active of this link becomes 1.
Right, because the sd device suspended.
Now the pm_runtime_get_suppliers() expects the link_active to be more than 1.
Not sure what you mean here. pm_runtime_*put*_suppliers() won't do anything if the link count is 1.
[ 6.941267][ T7] scsi 0:0:0:4: rpm_put_suppliers: [BEF] Supp (0:0:0:49488) usage_count: 4 rpm_active: 3
------ T196 Context comes in while T7 is running ----------
[ 6.941466][ T196] scsi 0:0:0:4: pm_runtime_get_suppliers: (0:0:0:49488): supp: usage_count: 5 rpm_active: 4
--------------------------------------------------------------
[ 7.788397][ T7] scsi 0:0:0:4: rpm_put_suppliers: [AFT] Supp (0:0:0:49488) usage_count: 2 rpm_active: 1
I meant to say that, if the rpm_put_suppliers() is invoked after the pm_runtime_get_suppliers() as is seen above then the link_active may become 1 even *after* pm_runtime_get_suppliers() is invoked.
I'm referring to the pm_runtime_get_suppliers() invoked from:
driver_probe_device() - say for, sd 0:0:0:x
|- pm_runtime_get_suppliers() - for sd 0:0:0:49488
I am hoping that was the problem that Rafael's revert dealt with.
In the good case it'd drop from 3 to 2. But in the bad case, I see that it drops to 1. That's when the supplier suspends before the consumer.Correct, but it'd again schedule a suspend (since autosuspend is enabled now) at the end of the sd_probe().
Now then, there comes a time, that when sd_probe() schedules a suspend, the supplier usage_count becomes 0 and the link_active becomes 1.
And the supplier suspends before the consumer.
sd probe first resumes the sd device which will resume the supplier.
Thereafter, pm_runtime_put_suppliers()(sd 0:0:0:49488) is invoked from driver_probe_device() which had actually invoked sd_probe().
That'd make the link_active to 1 even when sd 0:0:0:x is active.
If sd 0:0:0:x is active then rpm_get_suppliers() still has +1 rpm_active. pm_runtime_get_suppliers() also has +1 rpm_active.
i.e. rpm_active is 3. If rpm_put_suppliers() is called, it means sd 0:0:0:x has really runtime suspended (not just waiting for autosuspend. Otherwise when the probe ends pm_runtime_put_suppliers() will drop rpm_active from 3 to 2.
Sure, I've pushed the changes v13 today.
But it is a bit theoretical. Let's try it and see.
Essentially, we should stop the sd device from runtime suspending until it's probe is done. Then allow the same. Does it make sense?
So I was wondering, what'd make sure that the pm_runtime_get_suppliers() from driver_probe_device() is invoked after scsi_autopm_put_device() (1) finishes the rpm_put_suppliers().
Am not sure if I'm missing something in this.
Do you think, the current changes alone can fix the above issue?
Yes, but let's see.
Please let me know what you think.
I really think it would be good to try the changes that have been identified and see how it behaves.
Then go from there.