Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

From: Guenter Roeck
Date: Tue Feb 16 2021 - 22:09:24 EST


On Tue, Feb 16, 2021 at 06:39:55PM -0800, Saravana Kannan wrote:
> On Wed, Feb 10, 2021 at 1:21 PM Guenter Roeck <linux@xxxxxxxxxxxx> wrote:
> >
> > On 2/10/21 12:52 PM, Saravana Kannan wrote:
> > > On Wed, Feb 10, 2021 at 7:10 AM Guenter Roeck <linux@xxxxxxxxxxxx> wrote:
> > >>
> > >> On 2/10/21 12:20 AM, Saravana Kannan wrote:
> > >>> On Tue, Feb 9, 2021 at 9:54 PM Guenter Roeck <linux@xxxxxxxxxxxx> wrote:
> > >>>>
> > >>>> On Thu, Dec 17, 2020 at 07:17:03PM -0800, Saravana Kannan wrote:
> > >>>>> Cyclic dependencies in some firmware was one of the last remaining
> > >>>>> reasons fw_devlink=on couldn't be set by default. Now that cyclic
> > >>>>> dependencies don't block probing, set fw_devlink=on by default.
> > >>>>>
> > >>>>> Setting fw_devlink=on by default brings a bunch of benefits (currently,
> > >>>>> only for systems with device tree firmware):
> > >>>>> * Significantly cuts down deferred probes.
> > >>>>> * Device probe is effectively attempted in graph order.
> > >>>>> * Makes it much easier to load drivers as modules without having to
> > >>>>> worry about functional dependencies between modules (depmod is still
> > >>>>> needed for symbol dependencies).
> > >>>>>
> > >>>>> If this patch prevents some devices from probing, it's very likely due
> > >>>>> to the system having one or more device drivers that "probe"/set up a
> > >>>>> device (DT node with compatible property) without creating a struct
> > >>>>> device for it. If we hit such cases, the device drivers need to be
> > >>>>> fixed so that they populate struct devices and probe them like normal
> > >>>>> device drivers so that the driver core is aware of the devices and their
> > >>>>> status. See [1] for an example of such a case.
> > >>>>>
> > >>>>> [1] - https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@xxxxxxxxxxxxxx/
> > >>>>> Signed-off-by: Saravana Kannan <saravanak@xxxxxxxxxx>
> > >>>>
> > >>>> This patch breaks nios2 boot tests in qemu. The system gets stuck when
> > >>>> trying to reboot. Reverting this patch fixes the problem. Bisect log
> > >>>> is attached.
> > >>>
> > >>> Thanks for the report Guenter. Can you please try this series?
> > >>> https://lore.kernel.org/lkml/20210205222644.2357303-1-saravanak@xxxxxxxxxx/
> > >>>
> > >>
> > >> Not this week. I have lots of reviews to complete before the end of the week,
> > >> with the 5.12 commit window coming up.
> > >
> > > Ok. By next week, all the fixes should be in linux-next too. So it
> > > should be easier if you choose to test.
> > >
> > >> Given the number of problems observed, I personally think that it is way
> > >> too early for this patch. We'll have no end of problems if it is applied
> > >> to the upstream kernel in the next commit window. Of course, that is just
> > >> my personal opinion.
> > >
> > > You had said "with 115 of 430 boot tests failing in -next" earlier.
> > > Just to be sure I understand it right, you are not saying this patch
> > > caused them all right? You are just saying that 115 general boot
> > > failures that might mask fw_devlink issues in some of them, right?
> > >
> >
> > Correct.
>
> Is it right to assume [1] fixed all known boot issues due to fw_devlink=on?
> [1] - https://lore.kernel.org/lkml/20210215224258.1231449-1-saravanak@xxxxxxxxxx/
>

I honestly don't know. Current status of -next in my tests is:

Build results:
total: 149 pass: 144 fail: 5
Qemu test results:
total: 432 pass: 371 fail: 61

This is for next-20210216. Newly introduced failures keep popping up. Some
of the failures have been persistent for weeks, so it is all but impossible
to say if affected platforms experience more than one failure.

Also, please keep in mind that my boot tests are very shallow, along the
line of "it boots, therefore it works". It only tests hardware which is
emulated by qemu and is needed for booting. It tests probably much less
than 1% of driver code. It can and should not be used for any useful
fw_devlink related test coverage.

Thanks,
Guenter