Re: [PATCH 1/5] PCI/DPC: Ignore devices with no AER Capability

From: Bjorn Helgaas
Date: Tue Dec 01 2020 - 10:35:17 EST


On Sat, Nov 28, 2020 at 08:32:32PM -0800, Kuppuswamy, Sathyanarayanan wrote:
> On 11/28/20 3:25 PM, Bjorn Helgaas wrote:
> > On Sat, Nov 28, 2020 at 01:56:23PM -0800, Kuppuswamy, Sathyanarayanan wrote:
> > > On 11/28/20 1:53 PM, Bjorn Helgaas wrote:
> > > > On Sat, Nov 28, 2020 at 01:49:46PM -0800, Kuppuswamy, Sathyanarayanan wrote:
> > > > > On 11/28/20 12:24 PM, Bjorn Helgaas wrote:
> > > > > > On Wed, Nov 25, 2020 at 06:01:57PM -0800, Kuppuswamy, Sathyanarayanan wrote:
> > > > > > > On 11/25/20 5:18 PM, Bjorn Helgaas wrote:
> > > > > > > > From: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
> > > > > > > >
> > > > > > > > Downstream Ports may support DPC regardless of whether they support AER
> > > > > > > > (see PCIe r5.0, sec 6.2.10.2). Previously, if the user booted with
> > > > > > > > "pcie_ports=dpc-native", it was possible for dpc_probe() to succeed even if
> > > > > > > > the device had no AER Capability, but dpc_get_aer_uncorrect_severity()
> > > > > > > > depends on the AER Capability.
> > > > > > > >
> > > > > > > > dpc_probe() previously failed if:
> > > > > > > >
> > > > > > > > !pcie_aer_is_native(pdev) && !pcie_ports_dpc_native
> > > > > > > > !(pcie_aer_is_native() || pcie_ports_dpc_native) # by De Morgan's law
> > > > > > > >
> > > > > > > > so it succeeded if:
> > > > > > > >
> > > > > > > > pcie_aer_is_native() || pcie_ports_dpc_native
> > > > > > > >
> > > > > > > > Fail dpc_probe() if the device has no AER Capability.
> > > > > > > >
> > > > > > > > Signed-off-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
> > > > > > > > Cc: Olof Johansson <olof@xxxxxxxxx>
> > > > > > > > ---
> > > > > > > > drivers/pci/pcie/dpc.c | 3 +++
> > > > > > > > 1 file changed, 3 insertions(+)
> > > > > > > >
> > > > > > > > diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
> > > > > > > > index e05aba86a317..ed0dbc43d018 100644
> > > > > > > > --- a/drivers/pci/pcie/dpc.c
> > > > > > > > +++ b/drivers/pci/pcie/dpc.c
> > > > > > > > @@ -287,6 +287,9 @@ static int dpc_probe(struct pcie_device *dev)
> > > > > > > > int status;
> > > > > > > > u16 ctl, cap;
> > > > > > > > + if (!pdev->aer_cap)
> > > > > > > > + return -ENOTSUPP;
> > > > > > > Don't we check aer_cap support in drivers/pci/pcie/portdrv_core.c ?
> > > > > > >
> > > > > > > We don't enable DPC service, if AER service is not enabled. And AER
> > > > > > > service is only enabled if AER capability is supported.
> > > > > > >
> > > > > > > So dpc_probe() should not happen if AER capability is not supported?
> > > > > >
> > > > > > I don't think that's always true. If I'm reading this right, we have
> > > > > > this:
> > > > > >
> > > > > > get_port_device_capability(...)
> > > > > > {
> > > > > > #ifdef CONFIG_PCIEAER
> > > > > > if (dev->aer_cap && ...)
> > > > > > services |= PCIE_PORT_SERVICE_AER;
> > > > > > #endif
> > > > > >
> > > > > > if (pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DPC) &&
> > > > > > pci_aer_available() &&
> > > > > > (pcie_ports_dpc_native || (services & PCIE_PORT_SERVICE_AER)))
> > > > > > services |= PCIE_PORT_SERVICE_DPC;
> > > > > > }
> > > > > >
> > > > > > and in the case where:
> > > > > >
> > > > > > - CONFIG_PCIEAER=y
> > > > > > - booted with "pcie_ports=dpc-native" (pcie_ports_dpc_native is true)
> > > > > > - "dev" has no AER capability
> > > > > > - "dev" has DPC capability
> > > > > >
> > > > > > I think we do enable PCIE_PORT_SERVICE_DPC.
> > > > > Got it. But further looking into it, I am wondering whether
> > > > > we should keep this dependency? Currently we just use it to
> > > > > dump the error information. Do we need to create dependency
> > > > > between DPC and AER (which is functionality not dependent) just
> > > > > to see more details about the error?
> > > >
> > > > That's a good question, but I don't really want to get into the actual
> > > > operation of the AER and DPC drivers in this series, so maybe
> > > > something we should explore later.
> >
> > > In that case, can you move this check to
> > > drivers/pci/pcie/portdrv_core.c? I don't see the point of
> > > distributed checks in both get_port_device_capability() and
> > > dpc_probe().
> >
> > I totally agree that these distributed checks are terrible, but my
> > long-term hope is to get rid of portdrv and handle these "services"
> > more like we handle other capabilities. For example, maybe we can
> > squash dpc_probe() into pci_dpc_init(), so I'd actually like to move
> > things from get_port_device_capability() into dpc_probe().
> Removing the service driver model will be a major overhaul. It would
> affect even the error recovery drivers. You can find motivation
> for service drivers in Documentation/PCI/pciebus-howto.rst.

Part of that motivation for portdrv is to allow multiple service
drivers to run simultaneously on the same PCIe port. That is also
part of the *problem* with portdrv. There are several PCIe ports that
implement device-specific functionality via their BARs, and there is
currently no way for drivers to cleanly claim those ports because
they're already claimed by portdrv.

> But till we fix this part, I recommend grouping all dependency checks
> to one place (either dpc_probe() or portdrv service driver).

I think they should go in the driver, e.g., dpc_probe(). The typical
model is that bus drivers only match a device with the appropriate
driver, and the device driver does more specialized checks. For
example, the PCI bus driver only looks at the Vendor ID, Device ID,
and Class code.

But we don't have to solve all of this right now.

Bjorn