Re: [RFC] PCI: Fix kernel panic of root-port-less PCIe enum due to ASPM

From: Bjorn Helgaas
Date: Thu Oct 06 2016 - 09:14:22 EST


Hi Serge,

On Thu, Oct 06, 2016 at 12:34:15PM +0300, Serge Semin wrote:
> Hello linux folks,
>
> Sometime ago I discovered a kernel panic popping up when PCI subsystem was
> trying to enumerate PCI express bus with ASPM service enabled. Here it is:
>
> [ 5.089667] CPU 0 Unable to handle kernel paging request at virtual
> address 00000060, epc == 80317004, ra == 80316ac8
> [ 5.120952] Oops[#1]:
> ...
> [ 5.528438] Call Trace:
> [ 5.535640] [<80317004>] pcie_aspm_init_link_state+0x6c0/0x814
> [ 5.552843] [<80300c44>] pci_scan_slot+0x140/0x148
> [ 5.566957] [<80301dcc>] pci_scan_child_bus+0x50/0x1b0
> [ 5.582096] [<80301944>] pci_scan_bridge+0x25c/0x694
> [ 5.596724] [<80301e78>] pci_scan_child_bus+0xfc/0x1b0
> [ 5.611862] [<80301944>] pci_scan_bridge+0x25c/0x694
> [ 5.626488] [<80301e78>] pci_scan_child_bus+0xfc/0x1b0
> [ 5.641628] [<8030215c>] pci_scan_root_bus+0x64/0x124
> [ 5.656528] [<804ca298>] pcibios_scanbus+0xa8/0x188
>
> I more than sure you are familiar with the issue, since I've found the
> mailing discussion: "PCI: avoid NULL deref in alloc_pcie_link_state"
> https://patchwork.kernel.org/patch/2751651/
> https://bugzilla.kernel.org/show_bug.cgi?id=60111
>
> You closed the bugzilla ticket with the next statement:
> "I'm closing this as invalid because the simulated machine where the problem
> occurs has an invalid PCIe topology (an Upstream Port with no Downstream Port
> or Root Port above it). As far as I know, there is no valid topology, e.g.,
> a real hardware machine in the field, that would cause this failure."
>
> I'm strongly disagree with it, since I've got at least two hardware with
> PCIe-bus hierarchy as described in the mailing list. One of them is based on
> Cavium Octeon III CN7020. Here is a ASCII-diagram of PCIe-bus:

Thanks for this information. I reopened that bugzilla; can you attach
complete dmesg logs and "lspci -vv" output for your systems? As I
mentioned in comment #4, I'm completely open to fixing this. My
objections at the time were (1) there was no known hardware that could
trigger the problem, and (2) the proposed fix was ugly and prone to
future breakage. Since we now have real systems that trip over this,
we need to revisit it.

Bjorn

> -+-[0000:01]---00.0-[02-06]--+-02.0-[03-05]--+-00.0-[04-05]----00.0-[05]--
> | | \-00.1 Device [111d:808f]
> | \-04.0-[06]----00.0 Device [126f:0750]
> \-[0000:00]-
>
> where 01:00.0 is an Upstream port of IDT PCIe-swtich.
> / # /usr/local/sbin/lspci -v -s 01:00.0
> 01:00.0 Class 0604: Device 111d:8061
> Flags: bus master, fast devsel, latency 0
> Memory at <unassigned> (32-bit, non-prefetchable) [size=2]
> Memory at <unassigned> (32-bit, non-prefetchable) [size=2]
> Bus: primary=01, secondary=02, subordinate=06, sec-latency=0
> Memory behind bridge: 08000000-0dffffff
> Expansion ROM at <unassigned> [disabled] [size=2]
> Capabilities: [40] Express Upstream Port, MSI 00
> Capabilities: [c0] Power Management version 3
> Capabilities: [100] Advanced Error Reporting
> Capabilities: [200] Virtual Channel
> Kernel driver in use: pcieport
>
> As you can see PCI-bus hierarchy doesn't have root port and the very first
> upstream port is directly connected to Host-PCIe bridge of MCU, which of
> course is not listed by the lspci utility.
>
> Despite of Radim Kr?má?, who suggested a fix, which would de-facto just
> turned ASPM off, I found a quick solution, which disabled ASPM only in
> the first link (Host-PCIe=>Upstream port) of PCIe-bus for such hierarchy.
> ASPM for other PCIe-bus topologies shall work the way it was.
>
> I hope the fix will be helpful.
> Thanks,
>
> =============================
> Serge V. Semin
> Leading Programmer
> Embedded SW development group
> T-platforms
> =============================
>
> Signed-off-by: Serge Semin <fancer.lancer@xxxxxxxxx>
>
> ---
> drivers/pci/pcie/aspm.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index 0ec649d..a9295f29 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -522,7 +522,8 @@ static struct pcie_link_state *alloc_pcie_link_state(struct pci_dev *pdev)
> INIT_LIST_HEAD(&link->children);
> INIT_LIST_HEAD(&link->link);
> link->pdev = pdev;
> - if (pci_pcie_type(pdev) != PCI_EXP_TYPE_ROOT_PORT) {
> + if ((pci_pcie_type(pdev) != PCI_EXP_TYPE_ROOT_PORT) &&
> + (!pci_is_root_bus(pdev->bus->parent))) {
> struct pcie_link_state *parent;
> parent = pdev->bus->parent->self->link_state;
> if (!parent) {
> --
> 2.6.6
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html