Re: [3.1-rc4] Bus Fatal Error caused by "PCI: Set PCI-E Max PayloadSize on fabric"

From: Jon Mason
Date: Wed Sep 07 2011 - 17:11:05 EST


On Wed, Sep 7, 2011 at 1:58 PM, Simon Kirby <sim@xxxxxxxxxx> wrote:
> On Wed, Sep 07, 2011 at 01:57:28PM -0700, Jon Mason wrote:
>
>> On Wed, Sep 7, 2011 at 1:47 PM, Simon Kirby <sim@xxxxxxxxxx> wrote:
>> > On Wed, Sep 07, 2011 at 12:18:59PM -0700, Simon Kirby wrote:
>> >
>> >> On Wed, Sep 07, 2011 at 10:44:32AM -0700, Jesse Barnes wrote:
>> >>
>> >> > On Wed, 7 Sep 2011 12:52:25 -0400
>> >> > Josh Boyer <jwboyer@xxxxxxxxx> wrote:
>> >> >
>> >> > > On Wed, Sep 7, 2011 at 12:22 PM, Sven Schnelle <svens@xxxxxxxxxxxxxx>
>> >> > > wrote:
>> >> > > > Simon Kirby <sim@xxxxxxxxxx> writes:
>> >> > > >
>> >> > > >> Hello!
>> >> > > >>
>> >> > > >> Since trying 3.1-rc4 on a few Dell servers, all of them have
>> >> > > >> booted up with the amber error LED lit. "ipmitool sel list" shows:
>> >> > > >>
>> >> > > >> ?? ??1 | 09/06/2011 | 17:21:56 | Event Logging Disabled #0x72 | Log
>> >> > > >> area reset/cleared | Asserted 2 | 09/06/2011 | 17:25:38 | Critical
>> >> > > >> Interrupt #0x18 | Bus Fatal Error | Asserted 3 | 09/06/2011 |
>> >> > > >> 17:25:38 | Unknown #0x1a | 4 | 09/06/2011 | 17:25:38 | Unknown
>> >> > > >> #0x1a |
>> >> > > >
>> >> > > > I'm seeing exact the same issue on a Dell 1950 Server. If anyone
>> >> > > > wants me to try additional debugging/patches, feel free to do
>> >> > > > so. Unfortunately i don't have the time/knowledge to debug that by
>> >> > > > myself.
>> >> > >
>> >> > > I thought Jesse or Jon had a revert or partial fix queued up to send
>> >> > > to Linus, but I don't see anything in or post -rc5 yet. ?That was
>> >> > > indicated in https://bugzilla.kernel.org/show_bug.cgi?id=42162
>> >> > >
>> >> > > Jesse, Jon?
>> >> >
>> >> > kernel.org is still down and I haven't pushed anything to github. ?I
>> >> > asked Jon to send his patch directly to Linus today instead.
>> >>
>> >> FWIW, this patch didn't seem to fix it:
>> >> https://bugzilla.kernel.org/attachment.cgi?id=71222
>> >>
>> >> dmesg used to say:
>> >>
>> >> pci 0000:00:02.0: Dev MPS 128 MPSS 256 MRRS 128
>> >> pci 0000:00:02.0: Dev MPS 256 MPSS 256 MRRS 128
>> >> pci 0000:06:00.0: Dev MPS 128 MPSS 256 MRRS 4096
>> >> pci 0000:06:00.0: Dev MPS 256 MPSS 256 MRRS 128
>> >> pci 0000:07:00.0: Dev MPS 128 MPSS 256 MRRS 4096
>> >> pci 0000:07:00.0: Dev MPS 256 MPSS 256 MRRS 128
>> >> pci 0000:08:00.0: Dev MPS 128 MPSS 128 MRRS 128
>> >> pci 0000:08:00.0: MPS configured higher than maximum supported by the device. ?If a bus issue occurs, try running with pci=pcie_bus_safe.
>> >> pci 0000:08:00.0: Dev MPS 256 MPSS 256 MRRS 128
>> >> Uhhuh. NMI received for unknown reason 21 on CPU 0.
>> >> Do you have a strange power saving mode enabled?
>> >> Dazed and confused, but trying to continue
>> >
>> > Ok, I commented out the "pcie_write_mps(dev, mps);" line and the error
>> > stopped, but this made me realize that the pci=pcie_bus_safe option must
>> > have been missing. It turns out I had hacked a custom grub entry to load
>> > the newest kernel into grub instead of the one with the highest version
>> > number (grumble), so the default kopt didn't apply there.
>> >
>> > So, pci=pcie_bus_safe DOES fix this case, and I've confirmed that the
>> > MRRS-dissabling patch makes no difference in this case.
>> >
>> > Can we just make pci=pcie_bus_safe (as in previous behavior) the default,
>> > or make it not change where it would otherwise warn, or does that
>> > basically make the thing useless?
>>
>> I have a patch that does does pcie_bus_safe as the default behavior
>> and does not modify the MRRS.  Would you be willing to test this patch
>> for me?
>
> Sure, of course. (It compiles, ship it. :))

Great, thanks! I've attached a patch file to this e-mail.

Thanks,
Jon

>
> Simon-
>
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 0ce6742..4e84fd4 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -77,7 +77,7 @@ unsigned long pci_cardbus_mem_size = DEFAULT_CARDBUS_MEM_SIZE;
unsigned long pci_hotplug_io_size = DEFAULT_HOTPLUG_IO_SIZE;
unsigned long pci_hotplug_mem_size = DEFAULT_HOTPLUG_MEM_SIZE;

-enum pcie_bus_config_types pcie_bus_config = PCIE_BUS_PERFORMANCE;
+enum pcie_bus_config_types pcie_bus_config = PCIE_BUS_SAFE;

/*
* The default CLS is used if arch didn't set CLS explicitly and not
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 8473727..847ee5d 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1396,34 +1396,37 @@ static void pcie_write_mps(struct pci_dev *dev, int mps)

static void pcie_write_mrrs(struct pci_dev *dev, int mps)
{
- int rc, mrrs;
+ int rc, mrrs, dev_mpss;

- if (pcie_bus_config == PCIE_BUS_PERFORMANCE) {
- int dev_mpss = 128 << dev->pcie_mpss;
+ /* In the "safe" case, do not configure the MRRS. There appear to be
+ * issues with setting MRRS to 0 on a number of devices.
+ */

- /* For Max performance, the MRRS must be set to the largest
- * supported value. However, it cannot be configured larger
- * than the MPS the device or the bus can support. This assumes
- * that the largest MRRS available on the device cannot be
- * smaller than the device MPSS.
- */
- mrrs = mps < dev_mpss ? mps : dev_mpss;
- } else
- /* In the "safe" case, configure the MRRS for fairness on the
- * bus by making all devices have the same size
- */
- mrrs = mps;
+ if (pcie_bus_config != PCIE_BUS_PERFORMANCE)
+ return;
+
+ dev_mpss = 128 << dev->pcie_mpss;

+ /* For Max performance, the MRRS must be set to the largest supported
+ * value. However, it cannot be configured larger than the MPS the
+ * device or the bus can support. This assumes that the largest MRRS
+ * available on the device cannot be smaller than the device MPSS.
+ */
+ mrrs = min(mps, dev_mpss);

/* MRRS is a R/W register. Invalid values can be written, but a
- * subsiquent read will verify if the value is acceptable or not.
- * If the MRRS value provided is not acceptable (e.g., too large),
- * shrink the value until it is acceptable to the HW.
+ * subsiquent read will verify if the value is acceptable or not. If
+ * the MRRS value provided is not acceptable (eg, too large), shrink the
+ * value until it is acceptable to the HW.
*/
while (mrrs != pcie_get_readrq(dev) && mrrs >= 128) {
+ dev_warn(&dev->dev, "Attempting to modify the PCI-E MRRS value"
+ " to %d. If any issues are encountered, please try "
+ "running with pci=pcie_bus_safe\n", mrrs);
rc = pcie_set_readrq(dev, mrrs);
if (rc)
- dev_err(&dev->dev, "Failed attempting to set the MRRS\n");
+ dev_err(&dev->dev,
+ "Failed attempting to set the MRRS\n");

mrrs /= 2;
}
@@ -1436,13 +1439,13 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
if (!pci_is_pcie(dev))
return 0;

- dev_info(&dev->dev, "Dev MPS %d MPSS %d MRRS %d\n",
+ dev_dbg(&dev->dev, "Dev MPS %d MPSS %d MRRS %d\n",
pcie_get_mps(dev), 128<<dev->pcie_mpss, pcie_get_readrq(dev));

pcie_write_mps(dev, mps);
pcie_write_mrrs(dev, mps);

- dev_info(&dev->dev, "Dev MPS %d MPSS %d MRRS %d\n",
+ dev_dbg(&dev->dev, "Dev MPS %d MPSS %d MRRS %d\n",
pcie_get_mps(dev), 128<<dev->pcie_mpss, pcie_get_readrq(dev));

return 0;