Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

From: Marc Zyngier
Date: Mon Jan 18 2021 - 14:37:27 EST


On 2021-01-18 19:16, Geert Uytterhoeven wrote:
Hi Marc,

On Mon, Jan 18, 2021 at 6:59 PM Marc Zyngier <maz@xxxxxxxxxx> wrote:
On 2021-01-18 17:39, Geert Uytterhoeven wrote:
> On Fri, Dec 18, 2020 at 4:34 AM Saravana Kannan <saravanak@xxxxxxxxxx>
> wrote:
>> Cyclic dependencies in some firmware was one of the last remaining
>> reasons fw_devlink=on couldn't be set by default. Now that cyclic
>> dependencies don't block probing, set fw_devlink=on by default.
>>
>> Setting fw_devlink=on by default brings a bunch of benefits
>> (currently,
>> only for systems with device tree firmware):
>> * Significantly cuts down deferred probes.
>> * Device probe is effectively attempted in graph order.
>> * Makes it much easier to load drivers as modules without having to
>> worry about functional dependencies between modules (depmod is still
>> needed for symbol dependencies).
>>
>> If this patch prevents some devices from probing, it's very likely due
>> to the system having one or more device drivers that "probe"/set up a
>> device (DT node with compatible property) without creating a struct
>> device for it. If we hit such cases, the device drivers need to be
>> fixed so that they populate struct devices and probe them like normal
>> device drivers so that the driver core is aware of the devices and
>> their
>> status. See [1] for an example of such a case.
>>
>> [1] -
>> https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@xxxxxxxxxxxxxx/
>> Signed-off-by: Saravana Kannan <saravanak@xxxxxxxxxx>
>
> Shimoda-san reported that next-20210111 and later fail to boot
> on Renesas R-Car Gen3 platforms. No output is seen, unless earlycon
> is enabled.
>
> I have bisected this to commit e590474768f1cc04 ("driver core: Set
> fw_devlink=on by default").

There is a tentative patch from Saravana here[1], which works around
some issues on my RK3399 platform, and it'd be interesting to find
out whether that helps on your system.

Thanks,

M.

[1]
https://lore.kernel.org/r/20210116011412.3211292-1-saravanak@xxxxxxxxxx

Thanks for the suggestion, but given no devices probe (incl. GPIO
providers), I'm afraid it won't help. [testing] Indeed.

With the debug prints in device_links_check_suppliers enabled, and
some postprocessing, I get:

255 supplier e6180000.system-controller not ready
9 supplier fe990000.iommu not ready
9 supplier fe980000.iommu not ready
6 supplier febd0000.iommu not ready
6 supplier ec670000.iommu not ready
3 supplier febe0000.iommu not ready
3 supplier e7740000.iommu not ready
3 supplier e6740000.iommu not ready
3 supplier e65ee000.usb-phy not ready
3 supplier e6570000.iommu not ready
3 supplier e6054000.gpio not ready
3 supplier e6053000.gpio not ready

As everything is part of a PM Domain, the (lack of the) system controller
must be the culprit. What's wrong with it? It is registered very early in
the boot:

[ 0.142096] rcar_sysc_pd_init:442: of_genpd_add_provider_onecell() returned 0

Yeah, this looks like the exact same problem. The devlink stuff assumes
that because there is a "compatible" property, there will be a driver
directly associated with the node containing this property.

If any other node has a reference to that first node, the dependency
will only get resolved if/when that first node is bound to a driver.
Trouble is, there are *tons* of code in the tree that invalidate
this heuristic, and for each occurrence of this we get another failure.

The patch I referred to papers over it by registering a dummy driver,
but that doesn't scale easily...

M.
--
Jazz is not dead. It just smells funny...