Re: [PATCH 1/2] PCI: pciehp: Add support for OS-First Hotplug and AER/DPC

From: Smita Koralahalli
Date: Tue Feb 14 2023 - 04:34:50 EST


On 11/9/2022 11:12 AM, Sathyanarayanan Kuppuswamy wrote:
Hi,

On 10/31/22 5:07 PM, Smita Koralahalli wrote:
Current systems support Firmware-First model for hot-plug. In this model,
firmware holds the responsibilty for executing the HW sequencing actions on
an async or surprise add and removal events. Additionally, according to
Section 6.7.6 of PCIe Base Specification [1], firmware must also handle
the side-effects (DPC/AER events) reported on an async removal and is
abstract to the OS.
I don't see anything about hotplug firmware-first info in the above
specification reference and I don't think firmware first logic exists for
hotplug. Are you referring to AER/DPC firmware handling support?

Yes, sorry for the confusion here. Its AER/DPC FW vs OS handling support is what the
patch addresses about.

This model however, poses issues while rolling out updates or fixing bugs
as the servers need to be brought down for firmware updates. Hence,
introduce support for OS-First hot-plug and AER/DPC. Here, OS is
responsible for handling async add and remove along with handling of
AER/DPC events which are generated as a side-effect of async remove.
I think for OS handling we use "native" term. So use it instead of "OS-First".

Will take care.

The implementation is as follows: On an async remove a DPC is triggered as
a side-effect along with an MSI to the OS. Determine it's an async remove
by checking for DPC Trigger Status in DPC Status Register and Surprise
Down Error Status in AER Uncorrected Error Status to be non-zero. If true,
treat the DPC event as a side-effect of async remove, clear the error
status registers and continue with hot-plug tear down routines. If not,
follow the existing routine to handle AER/DPC errors.
I am wondering why your recovery fails below? Even if the device is disconnected,
error report handler will return PCI_ERS_RESULT_NEED_RESET and then attempt to do
reset using report_slot_reset(). This should work for your case, right?

Not sure, might be because two interrupts are being fired simultaneously here (pciehp and dpc).
The device is already brought down by pciehp handler and dpc handler is trying to reset the bridge
after the device is brought down?

Thanks,
Smita

Dmesg before:

pcieport 0000:00:01.4: DPC: containment event, status:0x1f01 source:0x0000
pcieport 0000:00:01.4: DPC: unmasked uncorrectable error detected
pcieport 0000:00:01.4: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Receiver ID)
pcieport 0000:00:01.4: device [1022:14ab] error status/mask=00000020/04004000
pcieport 0000:00:01.4: [ 5] SDES (First)
nvme nvme2: frozen state error detected, reset controller
pcieport 0000:00:01.4: DPC: Data Link Layer Link Active not set in 1000 msec
pcieport 0000:00:01.4: AER: subordinate device reset failed
pcieport 0000:00:01.4: AER: device recovery failed
pcieport 0000:00:01.4: pciehp: Slot(16): Link Down
nvme2n1: detected capacity change from 1953525168 to 0
pci 0000:04:00.0: Removing from iommu group 49

Dmesg after:

pcieport 0000:00:01.4: pciehp: Slot(16): Link Down
nvme1n1: detected capacity change from 1953525168 to 0
pci 0000:04:00.0: Removing from iommu group 37
pcieport 0000:00:01.4: pciehp: Slot(16): Card present
pci 0000:04:00.0: [8086:0a54] type 00 class 0x010802
pci 0000:04:00.0: reg 0x10: [mem 0x00000000-0x00003fff 64bit]
pci 0000:04:00.0: Max Payload Size set to 512 (was 128, max 512)
pci 0000:04:00.0: enabling Extended Tags
pci 0000:04:00.0: Adding to iommu group 37
pci 0000:04:00.0: BAR 0: assigned [mem 0xf2400000-0xf2403fff 64bit]
pcieport 0000:00:01.4: PCI bridge to [bus 04]
pcieport 0000:00:01.4: bridge window [io 0x1000-0x1fff]
pcieport 0000:00:01.4: bridge window [mem 0xf2400000-0xf24fffff]
pcieport 0000:00:01.4: bridge window [mem 0x20080800000-0x200809fffff 64bit pref]
nvme nvme1: pci function 0000:04:00.0
nvme 0000:04:00.0: enabling device (0000 -> 0002)
nvme nvme1: 128/0/0 default/read/poll queues

[1] PCI Express Base Specification Revision 6.0, Dec 16 2021.
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmembers.pcisig.com%2Fwg%2FPCI-SIG%2Fdocument%2F16609&data=05%7C01%7CSmita.KoralahalliChannabasappa%40amd.com%7Cc99c75e249cb44479d5e08dac2866426%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638036179704826384%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=IdqEMcVn5E70P0BW%2F8y6DmwI8ud6HIxWq2uNuwxT2pE%3D&reserved=0

Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@xxxxxxx>
---
drivers/pci/pcie/dpc.c | 61 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 61 insertions(+)

diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index f5ffea17c7f8..e422876f51ad 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -293,10 +293,71 @@ void dpc_process_error(struct pci_dev *pdev)
}
}
+static void pci_clear_surpdn_errors(struct pci_dev *pdev)
+{
+ u16 reg16;
+ u32 reg32;
+
+ pci_read_config_dword(pdev, pdev->dpc_cap + PCI_EXP_DPC_RP_PIO_STATUS, &reg32);
+ pci_write_config_dword(pdev, pdev->dpc_cap + PCI_EXP_DPC_RP_PIO_STATUS, reg32);
+
+ pci_read_config_word(pdev, PCI_STATUS, &reg16);
+ pci_write_config_word(pdev, PCI_STATUS, reg16);
+
+ pcie_capability_read_word(pdev, PCI_EXP_DEVSTA, &reg16);
+ pcie_capability_write_word(pdev, PCI_EXP_DEVSTA, reg16);
+}
+
+static void pciehp_handle_surprise_removal(struct pci_dev *pdev)
+{
+ if (pdev->dpc_rp_extensions && dpc_wait_rp_inactive(pdev))
+ return;
+
+ /*
+ * According to Section 6.7.6 of the PCIe Base Spec 6.0, since async
+ * removal might be unexpected, errors might be reported as a side
+ * effect of the event and software should handle them as an expected
+ * part of this event.
+ */
+ pci_aer_raw_clear_status(pdev);
+ pci_clear_surpdn_errors(pdev);
+
+ /*
+ * According to Section 6.13 and 6.15 of the PCIe Base Spec 6.0,
+ * following a hot-plug event, clear the ARI Forwarding Enable bit
+ * and AtomicOp Requester Enable as its not determined whether the
+ * next device inserted will support these capabilities. AtomicOp
+ * capabilities are not supported on PCI Express to PCI/PCI-X Bridges
+ * and any newly added component may not be an ARI device.
+ */
+ pcie_capability_clear_word(pdev, PCI_EXP_DEVCTL2,
+ (PCI_EXP_DEVCTL2_ARI | PCI_EXP_DEVCTL2_ATOMIC_REQ));
+
+ pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_STATUS,
+ PCI_EXP_DPC_STATUS_TRIGGER);
+}
+
+static bool pciehp_is_surprise_removal(struct pci_dev *pdev)
+{
+ u16 status;
+
+ pci_read_config_word(pdev, pdev->aer_cap + PCI_ERR_UNCOR_STATUS, &status);
+
+ if (!(status & PCI_ERR_UNC_SURPDN))
+ return false;
+
+ pciehp_handle_surprise_removal(pdev);
+
+ return true;
+}
+
static irqreturn_t dpc_handler(int irq, void *context)
{
struct pci_dev *pdev = context;
+ if (pciehp_is_surprise_removal(pdev))
+ return IRQ_HANDLED;
+
dpc_process_error(pdev);
/* We configure DPC so it only triggers on ERR_FATAL */