Re: [PATCH v1 1/2] driver core: fw_devlink: Add support for FWNODE_FLAG_BROKEN_PARENT

From: Saravana Kannan
Date: Tue Aug 31 2021 - 15:27:12 EST


On Tue, Aug 31, 2021 at 6:16 AM Andrew Lunn <andrew@xxxxxxx> wrote:
>
> > > I must admit, my main problem at the moment is -rc1 in two weeks
> > > time. It seems like a number of board with Ethernet switches will be
> > > broken, that worked before. phy-handle is not limited to switch
> > > drivers, it is also used for Ethernet drivers. So it could be, a
> > > number of Ethernet drivers are also going to be broken in -rc1?
> >
> > Again, in those cases, based on your FEC example, fw_devlink=on
> > actually improves things.
>
> Debatable. I did some testing. As expected some boards with Ethernet
> switches are now broken.

To clarify myself, I'm saying the patch to parse "ethernet" [8] will
make things better with fw_devlink=on for FEC. Not sure if you tested
that patch or not.

And yes, "phy-handle" will make things worse for switches because it
has two issues that need to be fixed. One is this deferred probe thing
for which I gave a patch in the previous email and the other is what
Marek reported (fix in the works). So can you revert "phy-handle"
support and test with [8] and you should see things be better with
fw_devlink=on.

[8] - https://lore.kernel.org/netdev/CAGETcx9=AyEfjX_-adgRuX=8a0MkLnj8sy2KJGhxpNCinJu4yA@xxxxxxxxxxxxxx/

> Without fw_devlink=on, some boards are not
> optimal, but they actually work. With it, they are broken.
>
> I did a bisect, and they have been broken since:
>
> ea718c699055c8566eb64432388a04974c43b2ea is the first bad commit
> commit ea718c699055c8566eb64432388a04974c43b2ea
> Author: Saravana Kannan <saravanak@xxxxxxxxxx>
> Date: Tue Mar 2 13:11:32 2021 -0800
>
> Revert "Revert "driver core: Set fw_devlink=on by default""
>
> This reverts commit 3e4c982f1ce75faf5314477b8da296d2d00919df.
>
> Since all reported issues due to fw_devlink=on should be addressed by
> this series, revert the revert. fw_devlink=on Take II.
>
> Signed-off-by: Saravana Kannan <saravanak@xxxxxxxxxx>
> Link: https://lore.kernel.org/r/20210302211133.2244281-4-saravanak@xxxxxxxxxx
> Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
>
> drivers/base/core.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> So however it is fixed, it needs to go into stable, not just -rc1.

Not sure what was the tip of the tree with which you bisected. For
example, if phy-handle broke things, bisect could still point at this
patch above depending on whether the first bisect is good/bad. Because
reverting this patch effectively disabled phy-handle parsing too.

To be clear, I'm not saying things aren't broken. I'm just pointing
out some nuances with the bisect that we need to be aware of.

> > Again, it's not a widespread problem as I explained before.
> > fw_devlink=on has been the default for 2 kernel versions now. With no
> > unfixed reported issues.
>
> Given that some Ethernet switches have been broken all that time, i
> wonder what else has been broken? Normally, the kernel which is
> release in December becomes the next LTS. It then gets picked up by
> the distros and more wide spread tested. So it could be, you get a
> flood of reports in January and February about things which are
> broken. This is why i don't think you should be relying on bug
> reports, you should be taking a more proactive stance and trying to
> analyse the DTB blobs.

As I mentioned earlier, just looking at DTB doesn't tell me much
because the driver could still be fine depending on how it's written.
Also, I don't have an easy way to do this. If I find a way, I'll do
it.

Btw, most bug reports that have been raised have been fixed with
generic fixes that address general DT patterns. For example,
fw_devlink has cycle detection built in, has support for devices that
never get probed, etc. Enabling fw_devlink=on and handling bug reports
with generic fixes has worked well so far to get fw_devlink to where
it is today. I've tried to be very quick in responding to fw_devlink
issues -- and that has worked out so far and hopefully it'll continue
to work out.

> I will spend some time trying out your proposed fix. See if they are a
> quick fix for stable.

Thanks for testing it out. And worst case, we'll revert phy-handle support.


-Saravana