Re: [PATCH] PCI: aardvark: Implement re-issuing config requests on CRS response

From: Bjorn Helgaas
Date: Tue Sep 14 2021 - 16:55:31 EST


On Tue, Sep 14, 2021 at 10:46:59PM +0200, Pali Rohár wrote:
> On Tuesday 14 September 2021 15:26:56 Bjorn Helgaas wrote:
> > On Mon, Aug 23, 2021 at 02:02:14PM +0200, Pali Rohár wrote:
> > > Commit 43f5c77bcbd2 ("PCI: aardvark: Fix reporting CRS value") fixed
> > > handling of CRS response and when CRSSVE flag was not enabled it marked CRS
> > > response as failed transaction (due to simplicity).
> > >
> > > But pci-aardvark.c driver is already waiting up to the PIO_RETRY_CNT count
> > > for PIO config response and implementation of re-issuing config requests
> > > according to PCIe base specification is therefore simple.
> >
> > I think the spec is confusingly worded. It says (PCIe r5.0, sec
> > 2.3.2) that when handling a Completion with CRS status for a config
> > request (paraphrasing slightly),
> >
> > If CRS Software Visibility is enabled, for config reads of Vendor
> > ID, the Root Complex returns 0x0001 for Vendor ID.
> >
> > Otherwise ... the Root Complex must re-issue the Configuration
> > Request as a new Request.
> >
> > BUT:
> >
> > A Root Complex implementation may choose to limit the number of
> > Configuration Request/ CRS Completion Status loops before
> > determining that something is wrong with the target of the Request
> > and taking appropriate action, e.g., complete the Request to the
> > host as a failed transaction.
> >
> > So I think zero is a perfectly valid number of retries, and I'm pretty
> > sure there are RCs that never retry.
> >
> > Is there a benefit to doing retry like this in the driver? Can we not
> > simply rely on retries at a higher level?
>
> I think that all drivers handle 0xFFFFFFFF read response as some kind of
> fatal error.

True.

> And because every PCI error is mapped to value 0xFFFFFFFF
> it means that higher level has no chance to distinguish easily between
> unsupported request and completion retry status.

Also true. But we don't *want* higher-level code to distinguish
these. The only place we should ever see CRS status is during
enumeration and after reset. Those code paths should look for CRS
status and retry as needed.

It is illegal for a device to return CRS after it has returned a
successful completion unless an intervening reset has occurred, so
drivers and other code should never see it.

> And issue is there also with write requests. Is somebody checking return
> value of pci_bus_write_config function?

Similar case here. The enumeration and wait-after-reset paths always
do *reads* until we get a successful completion, so I don't think we
ever issue a write that can get CRS.

> I guess that zero retry count as you pointed is valid. But it is
> something which we want?
>
> I sent this patch because implementation of request retry was very
> simple. Driver already waits for response, so adding another loop around
> it does not increase code complexity.

"Adding a loop does not increase code complexity"? Well, maybe not
MUCH, but it is a little, and the analysis behind it is fairly
significant.

Bjorn