Re: [BUG] Bisected Problem with LSI PCI FC Adapter

From: Bjorn Helgaas
Date: Fri Sep 19 2014 - 14:39:54 EST


On Sat, Sep 13, 2014 at 09:41:34PM +0200, Dirk Gouders wrote:
> So, I did some tests on the VX50 which probably wasn't the worst idea,
> because it behaves different than the test machine.
>
> Summary:
>
> 1) Bjorn's back pocket patch works on the VX50.
>
> On the test machine it causes a trace, mount_root has to do with
> it. I tried to use netconsole but it complained the interface were
> not ready.

OK, that's good. I put this revert patch in for-linus for v3.17. I regard
this as a temporary fix, not the real solution. My guess is the test
machine doesn't boot because you're missing a driver, so not related to the
revert patch.

> 3) Reset with Bjorn's commands
>
> DEV=00:0e.0
> setpci -s$DEV BRIDGE_CONTROL.W=0x0040
> sleep 1
> setpci -s$DEV BRIDGE_CONTROL.W=0x0000
> sleep 1
> echo 1 > /sys/bus/pci/rescan
>
> let the FC adapter appear but there are errors that I cannot really
> interpret.
>
> 4) Reset with Yinghai's patches and
>
> echo 1 > /sys/bus/pci/devices/0000\:00\:0e.0/pcie_link_disable
> echo 0 > /sys/bus/pci/devices/0000\:00\:0e.0/pcie_link_disable
> echo 1 > /sys/bus/pci/rescan
>
> gives a similar resut to 3).

Resetting the device or simply disabling and re-enabling the link was
enough to fix things from the device's perspective. In both cases, it
responded as one would expect:

pci_scan_child_bus: pci_bus 0000:06: scanning bus
pci 0000:06:00.0: [1000:0646] type 00 class 0x0c0400
pci 0000:06:00.0: reg 0x10: [io 0x0000-0x00ff]
pci 0000:06:00.0: reg 0x14: [mem 0x00000000-0x00003fff 64bit]
pci 0000:06:00.0: reg 0x1c: [mem 0x00000000-0x0000ffff 64bit]
pci 0000:06:00.0: reg 0x30: [mem 0x00000000-0x000fffff pref]

Linux tried to assign MMIO space to the device, but failed:

pci 0000:06:00.0: BAR 6: assigned [mem 0xd4200000-0xd42fffff pref]
pci 0000:06:00.0: BAR 3: no space for [mem size 0x00010000 64bit]
pci 0000:06:00.0: BAR 3: failed to assign [mem size 0x00010000 64bit]
pci 0000:06:00.0: BAR 1: no space for [mem size 0x00004000 64bit]
pci 0000:06:00.0: BAR 1: failed to assign [mem size 0x00004000 64bit]

The upstream bridge windows are:

pci 0000:00:0e.0: PCI bridge to [bus 06] # was originally to bus 0a
pci 0000:00:0e.0: bridge window [io 0x3000-0x3fff]
pci 0000:00:0e.0: bridge window [mem 0xd4200000-0xd42fffff]

So the ROM BAR (reg 0x30/BAR 6) takes up the whole window, leaving nothing
for BARs 1 and 3. This is something that Linux could do better. For
example, we could assign normal BARs first, followed by ROM BARs, since the
normal ones are more important. It's possible we could also try to expand
the bridge window so all the BARs would fit.

In any case, resetting the device is not a simple fix all by itself. So
our possibilities are:

1) Quirk to work around _CRS bug. This works but requires us to maintain
CPU-specific code that I really don't want.

2) Reset device when changing bus number. This works from the device
point of view, but would require additional Linux changes.

3) Revert 1820ffdccb9b. This works but is ugly because we're ignoring
some of what _CRS tells us.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/