Re: [RFC][PATCH] driver core: Extend returning EPROBE_DEFER for two minutes after late_initcall

From: Rob Herring
Date: Sat Feb 15 2020 - 16:26:20 EST


On Thu, Feb 13, 2020 at 8:42 PM John Stultz <john.stultz@xxxxxxxxxx> wrote:
>
> On Thu, Feb 13, 2020 at 5:51 PM Rob Herring <robh@xxxxxxxxxx> wrote:
> > On Thu, Feb 13, 2020 at 6:44 PM John Stultz <john.stultz@xxxxxxxxxx> wrote:
> > > Due to commit e01afc3250255 ("PM / Domains: Stop deferring probe
> > > at the end of initcall"), along with commit 25b4e70dcce9
> > > ("driver core: allow stopping deferred probe after init") after
> > > late_initcall, drivers will stop getting EPROBE_DEFER, and
> > > instead see an error causing the driver to fail to load.
> > >
> > > That change causes trouble when trying to use many clk drivers
> > > as modules, as the clk modules may not load until much later
> > > after init has started. If a dependent driver loads and gets an
> > > error instead of EPROBE_DEFER, it won't try to reload later when
> > > the dependency is met, and will thus fail to load.
> > >
> > > Instead of reverting that patch, this patch tries to extend the
> > > time that EPROBE_DEFER is retruned by two minutes, to (hopefully)
> > > ensure that everything has had a chance to load.
> >
> > I think regulators already has some delay like this. We should use the
> > same timeouts.
>
> Sounds good. My memory was a bit foggy from the time I initially
> brought this up, and I looked briefly before sending this out and
> didn't find the regulator change you had mentioned. If you have a
> specific pointer (or patch name or something) let me know, otherwise
> I'll dig around later tonight/tomorrow.
>
> > We also have the 'deferred_probe_timeout' cmdline option. It's deemed
> > a debug option currently, but we could change that and change the
> > default.
>
> I looked at that code, but couldn't really make heads or tails of it.
> The initcalls_done is checked and returns before the
> deferred_probe_timeout is looked at, so I was guessing the
> deferred_probe_timeout was addressing a bit more subtle issue than
> what I was going for. If its really the same functionality, I'm happy
> to try to rework it.

Subsystems have to opt-in in the current code to stop deferring at
init done. Though this was extended with
driver_deferred_probe_check_state_continue() which defers until the
timeout (commit 62a6bc3a1e4f4).

The only purpose of the timeout was to dump out what devices are still
deferred. I don't see any point in having 2 timeouts, so make
deferred_probe_timeout do what you want it to do. I think that's
mainly default to a sensible timeout value and have the subsystems you
care about honor the timeout. Perhaps we don't need both
driver_deferred_probe_check_state() and
driver_deferred_probe_check_state_continue() and can just have the
latter if all the subsystems need to honor the timeout.

I think it would be good if the console printed something periodically
about waiting for drivers. That way the system doesn't appear hung for
30 or 120 sec before giving up.

Also, there was one issue in particular that I ran into. Consoles have
to be built in and probed before init is done. So if your UART has any
dependencies, then those dependencies also have to be built-in. I
don't think anyone working on the 'everything is a module' problem has
fixed that.

> > > Specifically, on db845c, this change allows us to set
> > > SDM_GPUCC_845, QCOM_CLK_RPMH and COMMON_CLK_QCOM as modules and
> > > get a working system, where as without it the display will fail
> > > to load.
> > >
> > > Cc: Alexander Graf <agraf@xxxxxxx>
> > > Cc: Rob Herring <robh@xxxxxxxxxx>
> > > Cc: "Rafael J. Wysocki" <rjw@xxxxxxxxxxxxx>
> > > Cc: Kevin Hilman <khilman@xxxxxxxxxx>
> > > Cc: Ulf Hansson <ulf.hansson@xxxxxxxxxx>
> > > Cc: Pavel Machek <pavel@xxxxxx>
> > > Cc: Len Brown <len.brown@xxxxxxxxx>
> > > Cc: Todd Kjos <tkjos@xxxxxxxxxx>
> > > Cc: Bjorn Andersson <bjorn.andersson@xxxxxxxxxx>
> > > Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
> > > Cc: linux-pm@xxxxxxxxxxxxxxx
> > > Fixes: e01afc3250255 ("PM / Domains: Stop deferring probe at the end of initcall")
> > > Fixes: 25b4e70dcce9 ("driver core: allow stopping deferred probe after init")
> >
> > We can debate the design, but those work as designed. So Fixes?
> >
>
> Well, clk module loading would have worked, and then stopped working
> with e01afc3250255, so it is a regression of sorts.

If there was a clock driver upstream where this worked, then I agree.

> And really the
> tags are mostly for making sure patches get applied to trees that
> backported these commits (and it's not my intention to shame a patch
> as broken. :)

But 'Fixes' implies 'apply to stable' which is not the same as 'apply
backport to a distro kernel'.

Rob