Re: [GIT PULL] Driver core / debugfs changes for 5.12-rc1

From: Saravana Kannan
Date: Mon Mar 01 2021 - 02:43:56 EST


On Sat, Feb 27, 2021 at 6:27 AM Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
>
> On Wed, Feb 24, 2021 at 10:20:44AM -0800, Linus Torvalds wrote:
> > On Wed, Feb 24, 2021 at 6:27 AM Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > [..] I've reverted that change at
> > > the very end so we don't have to worry about regressions in 5.12.
> >
> > Side note: it would have been really nice to see links to the actual
> > problem reports in the revert commit.
>
> Odd, this showed up in my gmail spam folder, just saw this now :(

Yup, went to spam for me too!

> > Yes, there's a "Link:" line there, but that points to the
> > less-than-useful patch submission for the revert, not to the actual
> > _reasons_ for the revert.
> >
> > Now I'm looking at that revert, and I have absolutely no idea why it
> > happened. Only a very vague "there are still reported regressions
> > happening".
> >
> > I've pulled it, but wanted to just point out that when there's some
> > fairly fundamental revert like this, it really would be good to link
> > to the problems, so that when people try to re-enable it, they have
> > the history for why it didn't work the first time.
> >
> > Now all that history is basically lost (well, hopefully Saravana and
> > you actually remember, but you get my point).
>
> Sorry, the history is on the original commit Link that was reverted, and
> in lots of other emails on lkml over the past few weeks. Next time I'll
> include links to those threads as well.

These are links of interest. All the fixes are at least linked to from
these two threads. And if they are not, they all mention "Fixes" and
list the commit that was reverted. So it's all trackable if we really
find the need to do so in the future.

The original series that set fw_devlink=on where most of the issues
were reported.
[1] - https://lore.kernel.org/lkml/20201218031703.3053753-1-saravanak@xxxxxxxxxx/

This is the series that made fw_devlink=on more forgiving. And a few
issues were reported there.
[2] - https://lore.kernel.org/lkml/20210205222644.2357303-1-saravanak@xxxxxxxxxx/

To summarize, the issues fell into one of these types:
* Drivers would initialize the hardware without actually probing a
struct device that existed AND didn't use the existing mechanisms (Eg:
IRQCHIP_DECLARE) meant to allow this. So fw_devlink makes their
consumers wait forever. Bunch of driver fixes for this, but [2] also
workaround most of these by making fw_devlink/fwnode code a bit
smarter.
* Drivers would initialize the hardware without creating a struct
device at all despite the DT node having a compatible property. [2]
handles this by making fw_devlink a bit smarter.
* Some device link status not getting updated correctly when a driver
is force bound with a device (I know the fix, haven't gotten around to
submitting it).
* fw_devlink causing some probe reordering that should technically be
harmless (all suppliers probed before the consumer), but drivers still
don't like it. Bunch of fixes + reducing/removing unnecessary
reordering of probes by fw_devlink (latter is under review).
* Spinlock/corruption errors that fw_devlink exposed by reordering
some probes, but the actual issue was unrelated to fw_devlink.

So it was a mix of making fw_devlink smarter to deal with existing
code and fixing some drivers.

Thanks,
Saravana