Re: [PATCH] driver core: Drop devices_kset_move_last() call from really_probe()

From: Bjorn Helgaas
Date: Mon Jul 09 2018 - 18:06:57 EST


[+cc Kishon]

On Mon, Jul 9, 2018 at 4:35 PM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote:
>
> On Mon, Jul 9, 2018 at 3:57 PM, Bjorn Helgaas <bhelgaas@xxxxxxxxxx> wrote:
> > On Fri, Jul 6, 2018 at 5:01 AM Rafael J. Wysocki <rjw@xxxxxxxxxxxxx> wrote:
> >>
> >> From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> >>
> >> The devices_kset_move_last() call in really_probe() is a mistake
> >> as it may cause parents to follow children in the devices_kset list
> >> which then causes system shutdown to fail. Namely, if a device has
> >> children before really_probe() is called for it (which is not
> >> uncommon), that call will cause it to be reordered after the children
> >> in the devices_kset list and the ordering of that list will not
> >> reflect the correct device shutdown order.
> >>
> >> Also it causes the devices_kset list to be constantly reordered
> >> until all drivers have been probed which is totally pointless
> >> overhead in the majority of cases.
> >>
> >> For that reason, revert the really_probe() modifications made by
> >> commit 52cdbdd49853.
> >
> > I'm sure you've considered this, but I can't figure out whether this
> > patch will reintroduce the problem that was solved by 52cdbdd49853.
> > That patch updated two places: (1) really_probe(), the change you're
> > reverting here, and (2) device_move().
> >
> > device_move() is only called from 4-5 places, none of which look
> > related to the problem fixed by 52cdbdd49853, so it seems like that
> > problem was probably resolved by the hunk you're reverting.
>
> That's right, but I don't want to revert all of it. The other parts
> of it are kind of useful as they make the handling of the devices_kset
> list be consistent with the handling of dpm_list.
>
> The hunk I'm reverting, however, is completely off. It not only is
> incorrect (as per the above), but it also causes the devices_kset list
> and dpm_list to be handled differently.

If I understand correctly, you are saying:

- the 52cdbdd49853 really_probe() hunk fixed a problem, but
- that hunk was the wrong fix for it, and
- this patch removes the wrong fix (and probably reintroduces the problem)

If devices_kset is supposed to be ordered so children follow parents,
I agree the really_probe() hunk doesn't make much sense because the
parent/child relation is determined by the circuit design, not by the
probe order.

It just seems like it's worth being clear that we're reintroducing the
problem fixed by 52cdbdd49853, so it needs to be solved a different
way. Ideally that would be done before this patch so there's not a
regression, and this changelog could mention what's happening.

> It had attempted to fix something, but it failed miserably at that.

If you're saying that 52cdbdd49853 *tried* to fix a DRA7XX_evm reboot
problem, but in fact, it did not fix that problem, then I guess there
should be no issue with reverting that hunk.

> >> Fixes: 52cdbdd49853 (driver core: correct device's shutdown order)
> >> Link: https://lore.kernel.org/lkml/CAFgQCTt7VfqM=UyCnvNFxrSw8Z6cUtAi3HUwR4_xPAc03SgHjQ@xxxxxxxxxxxxxx/
> >> Reported-by: Pingfan Liu <kernelfans@xxxxxxxxx>
> >> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> >> ---
> >> drivers/base/dd.c | 8 --------
> >> 1 file changed, 8 deletions(-)
> >>
> >> Index: linux-pm/drivers/base/dd.c
> >> ===================================================================
> >> --- linux-pm.orig/drivers/base/dd.c
> >> +++ linux-pm/drivers/base/dd.c
> >> @@ -434,14 +434,6 @@ re_probe:
> >> goto probe_failed;
> >> }
> >>
> >> - /*
> >> - * Ensure devices are listed in devices_kset in correct order
> >> - * It's important to move Dev to the end of devices_kset before
> >> - * calling .probe, because it could be recursive and parent Dev
> >> - * should always go first
> >> - */
> >> - devices_kset_move_last(dev);
> >> -
> >> if (dev->bus->probe) {
> >> ret = dev->bus->probe(dev);
> >> if (ret)
> >>