Re: [PATCH v2 0/9] driver core: Fix some device links issues and add "consumer autoprobe" flag

From: Ulf Hansson
Date: Tue Feb 05 2019 - 03:16:30 EST


On Mon, 4 Feb 2019 at 12:45, Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote:
>
> On Mon, Feb 4, 2019 at 12:40 PM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote:
> >
> > On Fri, Feb 1, 2019 at 4:18 PM Ulf Hansson <ulf.hansson@xxxxxxxxxx> wrote:
> > >
> > > On Fri, 1 Feb 2019 at 02:04, Rafael J. Wysocki <rjw@xxxxxxxxxxxxx> wrote:
> > > >
> > > > Hi Greg at al,
> > > >
> > > > This is a combination of the two device links series I have posted
> > > > recently (https://lore.kernel.org/lkml/2493187.oiOpCWJBV7@xxxxxxxxxxxxxx/
> > > > and https://lore.kernel.org/lkml/2405639.4es7pRLqn0@xxxxxxxxxxxxxx/) rebased
> > > > on top of your driver-core-next branch.
> > > >
> > > > Recently I have been looking at the device links code because of the
> > > > recent discussion on possibly using them in the DRM subsystem (see for
> > > > example https://marc.info/?l=linux-pm&m=154832771905309&w=2) and I have
> > > > found a few issues in that code which should be addressed by this patch
> > > > series. Please refer to the patch changelogs for details.
> > > >
> > > > None of the problems addressed here should be manifesting themselves in
> > > > mainline kernel today, but if there are more device links users in the
> > > > future, they most likely will be encountered sooner or later. Also they
> > > > need to be fixed for the DRM use case to be supported IMO.
> > > >
> > > > On top of this the series makes device links support the "composite device"
> > > > use case in the DRM subsystem mentioned above (essentially, the last patch
> > > > in the series is for that purpose).
> > > >
> > >
> > > Rafael, Greg, I have reviewed patch 1 -> 7, they all look good to me.
> > >
> > > If not too late, feel free to add for the first 7 patches:
> > >
> > > Reviewed-by: Ulf Hansson <ulf.hansson@xxxxxxxxxx>
> >
> > Thanks!
> >
> > > Although, I want to point out one problem that I have found when using
> > > device links. I believe it's already there, even before this series,
> > > but just wanted to described it for your consideration.
> > >
> > > This is what happens:
> > > I have a platform driver is being probed. During ->probe() the driver
> > > adds a device link like this:
> > >
> > > link = device_link_add(consumer-dev, supplier-dev, DL_FLAG_STATELESS |
> > > DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE);
> > >
> > > At some point later in ->probe(), the driver realizes that it must
> > > remove the device link, either because it encountered an error or
> > > simply because it doesn't need the device link to be there anymore.
> > > Thus it calls:
> > >
> > > device_link_del(link);
> > >
> > > When probe finished of the driver, the runtime PM usage count for the
> > > supplier-dev remains increased to 1 and thus it never becomes runtime
> > > suspended.
> >
> > OK, so this is a tricky one.
> >
> > With this series applied, if the link actually goes away after the
> > cleanup device_link_del(), device_link_free() should take care of
> > dropping the PM-runtime count of the supplier. If it doesn't do that,
> > there is a mistake in the code that needs to be fixed.

Unless this is a of your "distracted part", then I think this is what
happening and thus is a problem.

> >
> > However, if the link doesn't go away after the cleanup
> > device_link_del(), the supplier's PM-runtime count will not be
> > dropped, because the core doesn't know whether or not the
> > device_link_del() has been called by the same entity that caused the
> > supplier's PM-runtime count to be incremented. For example, if the
> > consumer device is suspended after the device_link_add() that
> > incremented the supplier's PM-runtime count and then suspended again,
>
> I was distracted while writing this, sorry for the confusion.
>
> So let me rephrase:
>
> For example, if the consumer device is suspended after the
> device_link_add() that incremented the supplier's PM-runtime count and
> then resumed again, the rpm_active refcount will be greater than one
> because of the last resume and not because of the initial link
> creation. In that case, dropping the supplier's PM-runtime count on
> link deletion may not work as expected.

I see what your are saying and I must admit, by looking at the code,
that it has turned into being rather complicated. Assuming of good
reasons, of course.

Anyway, I will play a little bit more with my tests to see what I can find out.

>
> > Arguably, device_link_del() could be made automatically drop the
> > supplier's PM-runtime count by one if the link's rpm_active refcount
> > is not one, but there will be failing scenarios in that case too
> > AFAICS.

Let's see.

Kind regards
Uffe