PCI-E Link training bug

From: Jeff Roberson
Date: Tue Aug 03 2010 - 06:35:59 EST


Hello Folks,

At least one intel chipset will occasionally negotiate a 4x link for an 8x device in an 8x port. It is a known errata in the 5400 mch. Simply disabling and re-enabling the link is all that is required to restore full throughput. Toggling the retrain bit in the pci-e link control register alone is insufficient.

I have added a small bit of code to the pcie port device which checks for this condition and attempts to retrain the link. It is possible that it will give a false positive if the port is capable of accepting lesser width devices. This should be harmless although I would not rule out poor implementations having issues with gratuitous retraining.

Thanks,
Jeffdiff -r d021d01742ae drivers/pci/pcie/portdrv_pci.c
--- a/drivers/pci/pcie/portdrv_pci.c Tue Mar 09 15:13:58 2010 -0800
+++ b/drivers/pci/pcie/portdrv_pci.c Tue Aug 03 03:25:51 2010 -0700
@@ -70,6 +70,47 @@
#endif

/*
+ * pcie_portdrv_check_link - Check for link width mismatch
+ * @dev: PCI-Express port device to conditionally retrain
+ *
+ * Verify that the negotiated link matches the links capability and
+ * attempt to retrain if it does not. This may be a perfectly valid
+ * configuration, however, on some intel chipsets (i5400) an errata
+ * may cause the link to negotiate down erroneously.
+ */
+static inline void
+pcie_portdrv_check_link(struct pci_dev *dev)
+{
+ u32 linkcap;
+ u16 linkstat;
+ u16 linkctrl;
+ int capwidth;
+ int width;
+ int pos;
+ int i;
+
+ pos = pci_find_capability(dev, PCI_CAP_ID_EXP);
+ for (i = 0; i < 3; i++) {
+ pci_read_config_dword(dev, pos + PCI_EXP_LNKCAP, &linkcap);
+ pci_read_config_word(dev, pos + PCI_EXP_LNKSTA, &linkstat);
+ capwidth = (linkcap >> 4) & 0x3f;
+ width = (linkstat >> 4) & 0x3f;
+ if (width == capwidth || width == 0)
+ return;
+ dev_printk(KERN_INFO, &dev->dev,
+ "Link width mismatch: %d != %d, retraining\n",
+ capwidth, width);
+ pci_read_config_word(dev, pos + PCI_EXP_LNKCTL, &linkctrl);
+ pci_write_config_word(dev, pos + PCI_EXP_LNKCTL,
+ linkctrl | PCI_EXP_LNKCTL_DIS);
+ msleep(125);
+ pci_write_config_word(dev, pos + PCI_EXP_LNKCTL,
+ linkctrl | PCI_EXP_LNKCTL_RL);
+ msleep(125);
+ }
+}
+
+/*
* pcie_portdrv_probe - Probe PCI-Express port devices
* @dev: PCI-Express port device being probed
*
@@ -99,6 +140,7 @@
pci_disable_device(dev);
return -ENOMEM;
}
+ pcie_portdrv_check_link(dev);

pcie_portdrv_save_config(dev);

diff -r d021d01742ae include/linux/pci_regs.h
--- a/include/linux/pci_regs.h Tue Mar 09 15:13:58 2010 -0800
+++ b/include/linux/pci_regs.h Tue Aug 03 03:25:51 2010 -0700
@@ -400,6 +400,7 @@
#define PCI_EXP_LNKCAP_L1EL 0x38000 /* L1 Exit Latency */
#define PCI_EXP_LNKCAP_CLKPM 0x40000 /* L1 Clock Power Management */
#define PCI_EXP_LNKCTL 16 /* Link Control */
+#define PCI_EXP_LNKCTL_DIS 0x10 /* Disable Link */
#define PCI_EXP_LNKCTL_RL 0x20 /* Retrain Link */
#define PCI_EXP_LNKCTL_CCC 0x40 /* Common Clock COnfiguration */
#define PCI_EXP_LNKCTL_CLKREQ_EN 0x100 /* Enable clkreq */