Re: Issues with "PCI/LINK: Report degraded links via link bandwidth notification"

From: Bjorn Helgaas
Date: Fri Jan 29 2021 - 16:57:11 EST


On Thu, Jan 28, 2021 at 06:07:36PM -0600, Alex G. wrote:
> On 1/28/21 5:51 PM, Sinan Kaya wrote:
> > On 1/28/2021 6:39 PM, Bjorn Helgaas wrote:
> > > AFAICT, this thread petered out with no resolution.
> > >
> > > If the bandwidth change notifications are important to somebody,
> > > please speak up, preferably with a patch that makes the notifications
> > > disabled by default and adds a parameter to enable them (or some other
> > > strategy that makes sense).
> > >
> > > I think these are potentially useful, so I don't really want to just
> > > revert them, but if nobody thinks these are important enough to fix,
> > > that's a possibility.
> >
> > Hide behind debug or expert option by default? or even mark it as BROKEN
> > until someone fixes it?
> >
> Instead of making it a config option, wouldn't it be better as a kernel
> parameter? People encountering this seem quite competent in passing kernel
> arguments, so having a "pcie_bw_notification=off" would solve their
> problems.

I don't want people to have to discover a parameter to solve issues.
If there's a parameter, notification should default to off, and people
who want notification should supply a parameter to enable it. Same
thing for the sysfs idea.

I think we really just need to figure out what's going on. Then it
should be clearer how to handle it. I'm not really in a position to
debug the root cause since I don't have the hardware or the time. If
nobody can figure out what's going on, I think we'll have to make it
disabled by default.

> As far as marking this as broken, I've seen no conclusive evidence of to
> tell if its a sw bug or actual hardware problem. Could we have a sysfs to
> disable this on a per-downstream-port basis?
>
> e.g.
> echo 0 > /sys/bus/pci/devices/0000:00:04.0/bw_notification_enabled
>
> This probably won't be ideal if there are many devices downtraining their
> links ad-hoc. At worst we'd have a way to silence those messages if we do
> encounter such devices.
>
> Alex