Re: [PATCH V1] PCI/AER: Configure ECRC only AER is native

From: Bjorn Helgaas
Date: Thu Jan 12 2023 - 13:54:31 EST


On Wed, Jan 11, 2023 at 03:27:51PM -0800, Sathyanarayanan Kuppuswamy wrote:
> On 1/11/23 3:10 PM, Bjorn Helgaas wrote:
> > On Wed, Jan 11, 2023 at 01:42:21PM -0800, Sathyanarayanan Kuppuswamy wrote:
> >> On 1/11/23 12:31 PM, Vidya Sagar wrote:
> >>> As the ECRC configuration bits are part of AER registers, configure
> >>> ECRC only if AER is natively owned by the kernel.
> >>
> >> ecrc command line option takes "bios/on/off" as possible options. It
> >> does not clarify whether "on/off" choices can only be used if AER is
> >> owned by OS or it can override the ownership of ECRC configuration
> >> similar to pcie_ports=native option. Maybe that needs to be clarified.
> >
> > Good point, what do you think of an update like this:
> >
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > index 6cfa6e3996cf..f7b40a439194 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -4296,7 +4296,9 @@
> > specified, e.g., 12@pci:8086:9c22:103c:198f
> > for 4096-byte alignment.
> > ecrc= Enable/disable PCIe ECRC (transaction layer
> > - end-to-end CRC checking).
> > + end-to-end CRC checking). Only effective
> > + if OS has native AER control (either granted by
> > + ACPI _OSC or forced via "pcie_ports=native").
> > bios: Use BIOS/firmware settings. This is the
> > the default.
> > off: Turn ECRC off
>
> Looks fine. But do we even need "bios" option? Since it is the default
> value, I am not sure why we need to list that as an option again. IMO
> this could be removed.

I agree, it seems pointless.

> > I don't know whether the "ecrc=" parameter is really needed. If we
> > were adding it today, I would ask "why not enable ECRC wherever it is
> > supported?" If there are devices where it's broken, we could always
> > add quirks to disable it on a case-by-case basis.
>
> Checking the original patch which added it, it looks like the intention
> is to give option to boost performance over integrity.
>
> commit 43c16408842b0eeb367c23a6fa540ce69f99e347
> Author: Andrew Patterson <andrew.patterson@xxxxxx>
> Date: Wed Apr 22 16:52:09 2009 -0600
>
> PCI: Add support for turning PCIe ECRC on or off
>
> Adds support for PCI Express transaction layer end-to-end CRC checking
> (ECRC). This patch will enable/disable ECRC checking by setting/clearing
> the ECRC Check Enable and/or ECRC Generation Enable bits for devices that
> support ECRC.
>
> The ECRC setting is controlled by the "pci=ecrc=<policy>" command-line
> option. If this option is not set or is set to 'bios", the enable and
> generation bits are left in whatever state that firmware/BIOS set them to.
> The "off" setting turns them off, and the "on" option turns them on (if the
> device supports it).
>
> Turning ECRC on or off can be a data integrity versus performance
> tradeoff. In theory, turning it on will catch more data errors, turning
> it off means possibly better performance since CRC does not need to be
> calculated by the PCIe hardware and packet sizes are reduced.

Ah, right, and I think I was even part of the conversation when this
was added :)

I'm not sure I would make the same choice today, though. IMHO it's
kind of hard to defend choosing performance over data integrity.

If a platform really wants to sacrifice integrity for performance, it
could retain control of AER, and after Vidya's patch, Linux will leave
the ECRC configuration alone.

Straw-man: If Linux owns AER and ECRC is supported, enable ECRC by
default. Retain "ecrc=off" to turn it off, but drop a note in dmesg
and taint the kernel.

Bjorn