Re: [PATCH] driver core: Reorder devices on successful probe

From: Rafael J. Wysocki
Date: Thu Dec 03 2020 - 13:18:30 EST


On Thu, Dec 3, 2020 at 6:58 PM Thierry Reding <thierry.reding@xxxxxxxxx> wrote:
>
> From: Thierry Reding <treding@xxxxxxxxxx>
>
> Device drivers usually depend on the fact that the devices that they
> control are suspended in the same order that they were probed in. In
> most cases this is already guaranteed via deferred probe.
>
> However, there's one case where this can still break: if a device is
> instantiated before a dependency (for example if it appears before the
> dependency in device tree) but gets probed only after the dependency is
> probed. Instantiation order would cause the dependency to get probed
> later, in which case probe of the original device would be deferred and
> the suspend/resume queue would get reordered properly. However, if the
> dependency is provided by a built-in driver and the device depending on
> that driver is controlled by a loadable module, which may only get
> loaded after the root filesystem has become available, we can be faced
> with a situation where the probe order ends up being different from the
> suspend/resume order.
>
> One example where this happens is on Tegra186, where the ACONNECT is
> listed very early in device tree (sorted by unit-address) and depends on
> BPMP (listed very late because it has no unit-address) for power domains
> and clocks/resets. If the ACONNECT driver is built-in, there is no
> problem because it will be probed before BPMP, causing a probe deferral
> and that in turn reorders the suspend/resume queue. However, if built as
> a module, it will end up being probed after BPMP, and therefore not
> result in a probe deferral, and therefore the suspend/resume queue will
> stay in the instantiation order. This in turn causes problems because
> ACONNECT will be resumed before BPMP, which will result in a hang
> because the ACONNECT's power domain cannot be powered on as long as the
> BPMP is still suspended.
>
> Fix this by always reordering devices on successful probe. This ensures
> that the suspend/resume queue is always in probe order and hence meets
> the natural expectations of drivers vs. their dependencies.
>
> Reported-by: Jonathan Hunter <jonathanh@xxxxxxxxxx>
> Signed-off-by: Thierry Reding <treding@xxxxxxxxxx>

Saravana had submitted a very similar patch (I don't have a pointer to
that one though) and I was against it at that time due to
overhead-related concerns. There still are some, but maybe that
doesn't matter in practice.

Also, I kind of expect this to blow up somewhere, but since I have no
examples ready from the top of my head, I think let's try and see, so:

Acked-by: Rafael. J. Wysocki <rafael@xxxxxxxxxx>

> ---
> drivers/base/dd.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index 148e81969e04..cfc079e738bb 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -370,6 +370,13 @@ static void driver_bound(struct device *dev)
>
> device_pm_check_callbacks(dev);
>
> + /*
> + * Reorder successfully probed devices to the end of the device list.
> + * This ensures that suspend/resume order matches probe order, which
> + * is usually what drivers rely on.
> + */
> + device_pm_move_to_tail(dev);
> +
> /*
> * Make sure the device is no longer in one of the deferred lists and
> * kick off retrying all pending devices
> --
> 2.29.2
>