Re: [PATCH RFC 2/2] scsi: ufshcd: Fix device links when BOOT WLUN fails to probe

From: Rafael J. Wysocki
Date: Thu Jul 08 2021 - 11:12:34 EST


On Thu, Jul 8, 2021 at 5:03 PM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote:
>
> On Thu, Jul 8, 2021 at 4:17 PM Adrian Hunter <adrian.hunter@xxxxxxxxx> wrote:
> >
> > On 8/07/21 3:31 pm, Rafael J. Wysocki wrote:
> > > On Wed, Jul 7, 2021 at 7:49 PM Adrian Hunter <adrian.hunter@xxxxxxxxx> wrote:
> > >>
> > >> On 7/07/21 8:39 pm, Greg Kroah-Hartman wrote:
> > >>> On Wed, Jul 07, 2021 at 08:29:48PM +0300, Adrian Hunter wrote:
> > >>>> If a LUN fails to probe (e.g. absent BOOT WLUN), the device will not have
> > >>>> been registered but can still have a device link holding a reference to the
> > >>>> device. The unwanted device link will prevent runtime suspend indefinitely,
> > >>>> and cause some warnings if the supplier is ever deleted (e.g. by unbinding
> > >>>> the UFS host controller). Fix by explicitly deleting the device link when
> > >>>> SCSI destroys the SCSI device.
> > >>>>
> > >>>> Signed-off-by: Adrian Hunter <adrian.hunter@xxxxxxxxx>
> > >>>> ---
> > >>>> drivers/scsi/ufs/ufshcd.c | 7 +++++++
> > >>>> 1 file changed, 7 insertions(+)
> > >>>>
> > >>>> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> > >>>> index 708b3b62fc4d..483aa74fe2c8 100644
> > >>>> --- a/drivers/scsi/ufs/ufshcd.c
> > >>>> +++ b/drivers/scsi/ufs/ufshcd.c
> > >>>> @@ -5029,6 +5029,13 @@ static void ufshcd_slave_destroy(struct scsi_device *sdev)
> > >>>> spin_lock_irqsave(hba->host->host_lock, flags);
> > >>>> hba->sdev_ufs_device = NULL;
> > >>>> spin_unlock_irqrestore(hba->host->host_lock, flags);
> > >>>> + } else {
> > >>>> + /*
> > >>>> + * If a LUN fails to probe (e.g. absent BOOT WLUN), the device
> > >>>> + * will not have been registered but can still have a device
> > >>>> + * link holding a reference to the device.
> > >>>> + */
> > >>>> + device_links_scrap(&sdev->sdev_gendev);
> > >>>
> > >>> What created that link? And why did it do that before probe happened
> > >>> successfully?
> > >>
> > >> The same driver created the link.
> > >>
> > >> The documentation seems to say it is allowed to, if it is the consumer.
> > >> From Documentation/driver-api/device_link.rst
> > >>
> > >> Usage
> > >> =====
> > >>
> > >> The earliest point in time when device links can be added is after
> > >> :c:func:`device_add()` has been called for the supplier and
> > >> :c:func:`device_initialize()` has been called for the consumer.
> > >
> > > Yes, this is allowed, but if you've added device links to a device
> > > object that is not going to be registered after all, you are
> > > responsible for doing the cleanup.
> > >
> > > Why can't you call device_link_del() directly on those links?
> > >
> > > Or device_link_remove() if you don't want to deal with link pointers?
> > >
> >
> > Those only work for DL_FLAG_STATELESS device links, but we use only
> > DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE flags.
>
> So I'd probably modify device_link_remove() to check if the consumer
> device has been registered and run __device_link_del() directly
> instead of device_link_put_kref() if it hasn't.
>
> Or add an argument to it to force the removal.

Or even modify device_link_put_kref() like this:

static void device_link_put_kref(struct device_link *link)
{
if (link->flags & DL_FLAG_STATELESS)
kref_put(&link->kref, __device_link_del);
+ else if (!device_is_registered(link->consumer))
+ __device_link_del(link);
else
WARN(1, "Unable to drop a managed device link reference\n");
}