Re: [RESEND PATCH v1 1/5] of/platform: Speed up of_find_device_by_node()

From: Greg Kroah-Hartman
Date: Wed Jun 12 2019 - 13:13:04 EST


On Wed, Jun 12, 2019 at 10:53:09AM -0600, Rob Herring wrote:
> On Wed, Jun 12, 2019 at 8:22 AM Greg Kroah-Hartman
> <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> >
> > On Wed, Jun 12, 2019 at 07:53:39AM -0600, Rob Herring wrote:
> > > On Tue, Jun 11, 2019 at 3:52 PM Sandeep Patil <sspatil@xxxxxxxxxxx> wrote:
> > > >
> > > > On Tue, Jun 11, 2019 at 01:56:25PM -0700, 'Saravana Kannan' via kernel-team wrote:
> > > > > On Tue, Jun 11, 2019 at 8:18 AM Frank Rowand <frowand.list@xxxxxxxxx> wrote:
> > > > > >
> > > > > > Hi Saravana,
> > > > > >
> > > > > > On 6/10/19 10:36 AM, Rob Herring wrote:
> > > > > > > Why are you resending this rather than replying to Frank's last
> > > > > > > comments on the original?
> > > > > >
> > > > > > Adding on a different aspect... The independent replies from three different
> > > > > > maintainers (Rob, Mark, myself) pointed out architectural issues with the
> > > > > > patch series. There were also some implementation issues brought out.
> > > > > > (Although I refrained from bringing up most of my implementation issues
> > > > > > as they are not relevant until architecture issues are resolved.)
> > > > >
> > > > > Right, I'm not too worried about the implementation issues before we
> > > > > settle on the architectural issues. Those are easy to fix.
> > > > >
> > > > > Honestly, the main points that the maintainers raised are:
> > > > > 1) This is a configuration property and not describing the device.
> > > > > Just use the implicit dependencies coming from existing bindings.
> > > > >
> > > > > I gave a bunch of reasons for why I think it isn't an OS configuration
> > > > > property. But even if that's not something the maintainers can agree
> > > > > to, I gave a concrete example (cyclic dependencies between clock
> > > > > provider hardware) where the implicit dependencies would prevent one
> > > > > of the devices from probing till the end of time. So even if the
> > > > > maintainers don't agree we should always look at "depends-on" to
> > > > > decide the dependencies, we still need some means to override the
> > > > > implicit dependencies where they don't match the real dependency. Can
> > > > > we use depends-on as an override when the implicit dependencies aren't
> > > > > correct?
> > > > >
> > > > > 2) This doesn't need to be solved because this is just optimizing
> > > > > probing or saving power ("we should get rid of this auto disabling"):
> > > > >
> > > > > I explained why this patch series is not just about optimizing probe
> > > > > ordering or saving power. And why we can't ignore auto disabling
> > > > > (because it's more than just auto disabling). The kernel is currently
> > > > > broken when trying to use modules in ARM SoCs (probably in other
> > > > > systems/archs too, but I can't speak for those).
> > > > >
> > > > > 3) Concerns about backwards compatibility
> > > > >
> > > > > I pointed out why the current scheme (depends-on being the only source
> > > > > of dependency) doesn't break compatibility. And if we go with
> > > > > "depends-on" as an override what we could do to keep backwards
> > > > > compatibility. Happy to hear more thoughts or discuss options.
> > > > >
> > > > > 4) How the "sync_state" would work for a device that supplies multiple
> > > > > functionalities but a limited driver.
> > > >
> > > > <snip>
> > > > To be clear, all of above are _real_ problems that stops us from efficiently
> > > > load device drivers as modules for Android.
> > > >
> > > > So, if 'depends-on' doesn't seem like the right approach and "going back to
> > > > the drawing board" is the ask, could you please point us in the right
> > > > direction?
> > >
> > > Use the dependencies which are already there in DT. That's clocks,
> > > pinctrl, regulators, interrupts, gpio at a minimum. I'm simply not
> > > going to accept duplicating all those dependencies in DT. The downside
> > > for the kernel is you have to address these one by one and can't have
> > > a generic property the driver core code can parse. After that's in
> > > place, then maybe we can consider handling any additional dependencies
> > > not already captured in DT. Once all that is in place, we can probably
> > > sort device and/or driver lists to optimize the probe order (maybe the
> > > driver core already does that now?).
> > >
> > > Get rid of the auto disabling of clocks and regulators in
> > > late_initcall. It's simply not a valid marker that boot is done when
> > > modules are involved. We probably can't get rid of it as lot's of
> > > platforms rely on that, so it will have to be opt out. Make it the
> > > platform's responsibility for ensuring a consistent state.
> > >
> > > Perhaps we need a 'boot done' or 'stop deferring probe' trigger from
> > > userspace in order to make progress if dependencies are missing.
> >
> > People have tried to do this multiple times, and you never really know
> > when "boot is done" due to busses that have discoverable devices and
> > async probing of other busses.
>
> Yes, I know which is why I proposed the second name with more limited
> meaning/function.

I still don't want to have the kernel have to rely on this.

> > You do know "something" when you pivot to a new boot disk, and when you
> > try to load init, but given initramfs and the fact that modules are
> > usually included on them, that's not really a good indication that
> > anything is "finished".
> >
> > I don't want userspace to be responsible for telling the kernel, "hey
> > you should be finished now!", as that's an async notification that is
> > going to be ripe for problems.
>
> The usecase I care about here is when the DT has the dependency
> information, but the kernel doesn't have the driver and the dependency
> is never resolved.

Then we have the same situation as today and nothing different happens,
right?

> The same problem has to be solved with a
> 'depends-on' property. This easily happens with a new DT with added
> dependencies like pinctrl and an old kernel that doesn't have the
> "new" driver. Another example is IOMMUs. We need some way to say stop
> waiting for dependencies. It is really just a debug option (of course,
> how to prevent a debug option from being used in production?). This
> works now for built-in cases with the same late_initcall abuse.

What is a debug option? We need something "for real".

> Using late_initcall_sync as an indicator has all the same problems
> with userspace indicating boot finished. We should get rid of the
> late_initcall_sync abuses and stop trying to work around them.

I agree, but that's not the issue here.

> > I really like the "depends-on" information, as it shows a topology that
> > DT doesn't seem to be able to show today, yet we rely on it in the
> > kernel with the whole deferred probing mess. To me, there doesn't seem
> > to be any other way to properly "know" this.
>
> As I said, DT *does* have this dependency information already. The
> problem is the kernel probing doesn't use it. Fix that and then we can
> discuss dependencies the DT doesn't provide that the kernel needs.

Where can the kernel probing be fixed to use it? What am I missing that
can be done instead of what this patchset does?

> > > Or
> > > maybe just some timeout would be sufficient. I think this is probably
> > > more useful for development than in a shipping product. Even if you
> > > could fallback to polling mode instead of interrupts for example, I
> > > doubt you would want to in a product.
> >
> > timeouts suck. And do not work for shipping products. I want a device
> > with 100 modules that relys on DT to be able to boot just as fast as a
> > laptop with 100 modules that has all of the needed dependancies
> > described today in their bus topologies, because they used sane hardware
> > (i.e. PCI and ACPI). Why hurt embedded people just because their
> > hardware relies on a system where you have to loop for long periods of
> > time because DT can not show the topology correctly?
>
> I failed to list buses, but those too are already described in DT. Bus
> dependencies are handled already.

That's not my point, I know sane busses are described in DT :)

> > > You should also keep in mind that everything needed for a console has
> > > to be built in. Maybe Android can say the console isn't needed, but in
> > > general we can't.
> >
> > What does a console have to do with any of this?
>
> Do you want to move all the SoC support to modules and have a console?

That's not the issue/point here at all. I really could care less about
a console, the issue is that we need a way to take a "generic" arm64
kernel, with a load of modules (i.e. allmodconfig) and have it boot in a
sane amount of time.

Other arches do this today, why can't arm64? That is the issue here,
not console or late init calls.

> That doesn't work. Dependencies for the console have to be resolved
> before initcalls are done. I don't recall the exact details, but I did
> hit that issue when working on handling the missing dependency issues
> (25b4e70dcce9 driver core: allow stopping deferred probe after init).
>
> The point is that moving things to modules does introduce issues
> beyond just dependency tracking.

I totally agree, and yet here we both are typing on Linux machines where
this all "just works" today. Now to get that to work on a DT-based
system :)

thanks,

greg k-h