Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

From: Saravana Kannan
Date: Tue Feb 16 2021 - 22:15:20 EST


On Tue, Feb 16, 2021 at 7:05 PM Guenter Roeck <linux@xxxxxxxxxxxx> wrote:
>
> On Tue, Feb 16, 2021 at 06:39:55PM -0800, Saravana Kannan wrote:
> > On Wed, Feb 10, 2021 at 1:21 PM Guenter Roeck <linux@xxxxxxxxxxxx> wrote:
> > >
> > > On 2/10/21 12:52 PM, Saravana Kannan wrote:
> > > > On Wed, Feb 10, 2021 at 7:10 AM Guenter Roeck <linux@xxxxxxxxxxxx> wrote:
> > > >>
> > > >> On 2/10/21 12:20 AM, Saravana Kannan wrote:
> > > >>> On Tue, Feb 9, 2021 at 9:54 PM Guenter Roeck <linux@xxxxxxxxxxxx> wrote:
> > > >>>>
> > > >>>> On Thu, Dec 17, 2020 at 07:17:03PM -0800, Saravana Kannan wrote:
> > > >>>>> Cyclic dependencies in some firmware was one of the last remaining
> > > >>>>> reasons fw_devlink=on couldn't be set by default. Now that cyclic
> > > >>>>> dependencies don't block probing, set fw_devlink=on by default.
> > > >>>>>
> > > >>>>> Setting fw_devlink=on by default brings a bunch of benefits (currently,
> > > >>>>> only for systems with device tree firmware):
> > > >>>>> * Significantly cuts down deferred probes.
> > > >>>>> * Device probe is effectively attempted in graph order.
> > > >>>>> * Makes it much easier to load drivers as modules without having to
> > > >>>>> worry about functional dependencies between modules (depmod is still
> > > >>>>> needed for symbol dependencies).
> > > >>>>>
> > > >>>>> If this patch prevents some devices from probing, it's very likely due
> > > >>>>> to the system having one or more device drivers that "probe"/set up a
> > > >>>>> device (DT node with compatible property) without creating a struct
> > > >>>>> device for it. If we hit such cases, the device drivers need to be
> > > >>>>> fixed so that they populate struct devices and probe them like normal
> > > >>>>> device drivers so that the driver core is aware of the devices and their
> > > >>>>> status. See [1] for an example of such a case.
> > > >>>>>
> > > >>>>> [1] - https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@xxxxxxxxxxxxxx/
> > > >>>>> Signed-off-by: Saravana Kannan <saravanak@xxxxxxxxxx>
> > > >>>>
> > > >>>> This patch breaks nios2 boot tests in qemu. The system gets stuck when
> > > >>>> trying to reboot. Reverting this patch fixes the problem. Bisect log
> > > >>>> is attached.
> > > >>>
> > > >>> Thanks for the report Guenter. Can you please try this series?
> > > >>> https://lore.kernel.org/lkml/20210205222644.2357303-1-saravanak@xxxxxxxxxx/
> > > >>>
> > > >>
> > > >> Not this week. I have lots of reviews to complete before the end of the week,
> > > >> with the 5.12 commit window coming up.
> > > >
> > > > Ok. By next week, all the fixes should be in linux-next too. So it
> > > > should be easier if you choose to test.
> > > >
> > > >> Given the number of problems observed, I personally think that it is way
> > > >> too early for this patch. We'll have no end of problems if it is applied
> > > >> to the upstream kernel in the next commit window. Of course, that is just
> > > >> my personal opinion.
> > > >
> > > > You had said "with 115 of 430 boot tests failing in -next" earlier.
> > > > Just to be sure I understand it right, you are not saying this patch
> > > > caused them all right? You are just saying that 115 general boot
> > > > failures that might mask fw_devlink issues in some of them, right?
> > > >
> > >
> > > Correct.
> >
> > Is it right to assume [1] fixed all known boot issues due to fw_devlink=on?
> > [1] - https://lore.kernel.org/lkml/20210215224258.1231449-1-saravanak@xxxxxxxxxx/
> >
>
> I honestly don't know. Current status of -next in my tests is:
>
> Build results:
> total: 149 pass: 144 fail: 5
> Qemu test results:
> total: 432 pass: 371 fail: 61
>
> This is for next-20210216. Newly introduced failures keep popping up. Some
> of the failures have been persistent for weeks, so it is all but impossible
> to say if affected platforms experience more than one failure.
>
> Also, please keep in mind that my boot tests are very shallow, along the
> line of "it boots, therefore it works". It only tests hardware which is
> emulated by qemu and is needed for booting. It tests probably much less
> than 1% of driver code. It can and should not be used for any useful
> fw_devlink related test coverage.

Agreed. I'm not using this for fw_devlink=on test coverage. Just
checking to make sure I've addressed any issues you've seen.

FYI, you can change it at runtime using the kernel commandline param
fw_devlink=permissive. So, you don't have to build all these kernels
again to test if fw_devlink=on is making things worse.

-Saravana