Re: Help needed in understanding weird PCIe issue on imx6q (PCIe just goes bad)

From: Bjorn Helgaas
Date: Wed Feb 26 2020 - 18:25:55 EST


On Sat, Feb 22, 2020 at 04:25:41PM +0100, Fawad Lateef wrote:
> Hello,
>
> I am trying to figure-out an issue on our i.MX6Q platform based design
> where PCIe interface goes bad.
>
> We have a Phytec i.MX6Q eMMC SOM, attached to our custom designed
> board. PCIe root-complex from i.MX6Q is attached to PLX switch
> (PEX8605).
>
> Linux kernel version is 4.19.9x and also 4.14.134 (from phytec's
> linux-mainline repo). Kernel do not have PCIe hot-plug and PNP enabled
> in config.
>
> PLX switch #PERST is attached to a GPIO pin and stays in disable state
> until Linux is booted. So at boot time only PCIe root-complex is
> initialized by kernel.
>
> After boot if I do "lspci -v" and see everything good from PCIe
> root-complex (below):
>
> ~ # lspci -v
> 00:00.0 PCI bridge: Synopsys, Inc. Device abcd (rev 01) (prog-if 00
> [Normal decode])
> Flags: bus master, fast devsel, latency 0, IRQ 295
> Memory at 01000000 (32-bit, non-prefetchable) [size=1M]
> Bus: primary=00, secondary=01, subordinate=ff, sec-latency=0
> I/O behind bridge: None
> Memory behind bridge: None
> Prefetchable memory behind bridge: None
> [virtual] Expansion ROM at 01100000 [disabled] [size=64K]
> Capabilities: [40] Power Management version 3
> Capabilities: [50] MSI: Enable+ Count=1/1 Maskable+ 64bit+
> Capabilities: [70] Express Root Port (Slot-), MSI 00
> Capabilities: [100] Advanced Error Reporting
> Capabilities: [140] Virtual Channel
> Kernel driver in use: pcieport
>
>
> Then I enable the #PERST pin of PLX switch, everything is still good
> (no rescan on Linux is done yet)
>
> ~ # echo 139 > /sys/class/gpio/export
> ~ # echo out > /sys/class/gpio/gpio139/direction
> ~ # echo 1 > /sys/class/gpio/gpio139/value
> ~ # lspci -v
> 00:00.0 PCI bridge: Synopsys, Inc. Device abcd (rev 01) (prog-if 00
> [Normal decode])
> Flags: bus master, fast devsel, latency 0, IRQ 295
> Memory at 01000000 (32-bit, non-prefetchable) [size=1M]
> Bus: primary=00, secondary=01, subordinate=ff, sec-latency=0
> I/O behind bridge: None
> Memory behind bridge: None
> Prefetchable memory behind bridge: None
> [virtual] Expansion ROM at 01100000 [disabled] [size=64K]
> Capabilities: [40] Power Management version 3
> Capabilities: [50] MSI: Enable+ Count=1/1 Maskable+ 64bit+
> Capabilities: [70] Express Root Port (Slot-), MSI 00
> Capabilities: [100] Advanced Error Reporting
> Capabilities: [140] Virtual Channel
> Kernel driver in use: pcieport
>
>
> Now just disable/put-in-reset the PLX switch (Linux don't see the
> switch yet, as no rescan on PCIe was done). Now "lspci -v" and
> root-complex goes bad.
>
> ~ # echo 0 > /sys/class/gpio/gpio139/value
> ~ # lspci -v
> 00:00.0 PCI bridge: Synopsys, Inc. Device abcd (rev 01) (prog-if 00
> [Normal decode])
> Flags: fast devsel, IRQ 295
> Memory at 01000000 (64-bit, prefetchable) [disabled] [size=1M]
> Bus: primary=00, secondary=00, subordinate=00, sec-latency=0
> I/O behind bridge: 00000000-00000fff [size=4K]
> Memory behind bridge: 00000000-000fffff [size=1M]
> Prefetchable memory behind bridge: 00000000-000fffff [size=1M]
> [virtual] Expansion ROM at 01100000 [disabled] [size=64K]
> Capabilities: [40] Power Management version 3
> Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
> Capabilities: [70] Express Root Port (Slot-), MSI 00
> Capabilities: [100] Advanced Error Reporting
> Capabilities: [140] Virtual Channel
> Kernel driver in use: pcieport
>
> ~ # uname -a
> Linux buildroot-2019.08-imx6 4.14.134-phy2 #1 SMP Thu Feb 20 12:13:33
> UTC 2020 armv7l GNU/Linux
> ~ #
>
>
> I am really not sure what is going wrong here. Did I am missing
> something basic?

I agree, it looks like something's wrong, but I really don't have any
ideas.

I would start by using "lspci -xxxx" to see the actual values we get
from config space. It looks like we're reading zeros from at least
the bus and window registers.

You could also instrument the i.MX config accessors in case there's
something strange going on there. Maybe try to reproduce this on a
current upstream kernel?

Bjorn