Re: [PATCH 4/4] PCI: quirk Atheros AR93xx to avoid bus reset

From: Andreas Hartmann
Date: Mon Jan 12 2015 - 10:23:51 EST


Alex Williamson wrote:
> On Thu, 2015-01-08 at 09:07 -0700, Bjorn Helgaas wrote:
>> On Fri, Nov 21, 2014 at 11:24:27AM -0700, Alex Williamson wrote:
>>> Reports against the TL-WDN4800 card indicate that PCI bus reset of
>>> this Atheros device cause system lock-ups and resets. I've also
>>> been able to confirm this behavior on multiple systems. The device
>>> never returns from reset and attempts to access config space of the
>>> device after reset result in hangs. Blacklist bus reset for the
>>> device to avoid this issue.
>>>
>>> Reported-by: Andreas Hartmann <andihartmann@xxxxxxxxxx>
>>> Signed-off-by: Alex Williamson <alex.williamson@xxxxxxxxxx>
>>> Tested-by: Andreas Hartmann <andihartmann@xxxxxxxxxx>
>>
>> If I understand correctly, these two (patches 3 & 4) fix a v3.14 regression
>> caused by 425c1b223dac ("PCI: Add Virtual Channel to save/restore support").
>>
>> If so, these should go to for-linus for v3.19. What about patches 1 & 2?
>> Do they fix a regression? Is there a pointer to a bugzilla or problem
>> report about that issue?
>>
>> I don't understand the connection between 425c1b223dac and
>> PCI_DEV_FLAGS_NO_BUS_RESET, because 425c1b223dac doesn't seem to do any
>> resets. Is that the wrong commit, or can you outline the connection for
>> me?
>
> TBH, I don't have a lot of faith in associating this to 425c1b223dac,
> I'm not sure how Andreas' bisect landed there.

Because removing this patch made it working again :-)

And too:
http://thread.gmane.org/gmane.linux.kernel.pci/35170/focus=35984

Kernel 2.10. and 2.12. and 2.13. did work fine for me. 2.14 is the first
kernel, which hangs the machine at startup of the VM. The userland
(qemu) didn't change in between.

Therefore: from my point of view, it is a regression, because things
have been working < 2.14.

Besides that: It is undoubted, that there is a problem with resetting
this card. But the difference between >= 3.14 and < 3.14 is, that < 3.14
has been working nevertheless. The patch
425c1b223dac456d00a61fd6b451b6d1cf00d065 obviously changed something
which I can't say and I don't know off. Therefore, the quirk-patch is
definitely required, because things work completely fine again w/ this
patch.

"Working" means for me here: I was able to start (and use) the VM w/o
crashing the machine and this isn't possible w/ unpatched 2.14+ any
more. Yes, w/ 2.12, I wasn't able to restart the VM (it then crashed the
machine), but w/ 2.10 even this was possible.


> IME, this device cannot,
> and has never been able to handle a bus reset. A simple setpci
> experiment on the commandline can confirm this. What I think happened
> is that with the PCI bus reset infrastructure we added, we switched QEMU
> to prefer PCI bus resets over things like PM D3hot->D0 resets. So it's
> just more prolific use of bus resets by userspace.
>
> There's also no regression in 1 & 2, PM reset has never done anything
> useful on those devices. Thanks,
>
> Alex
>
>>> ---
>>>
>>> drivers/pci/quirks.c | 14 ++++++++++++++
>>> 1 file changed, 14 insertions(+)
>>>
>>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>>> index 561e10d..ebbd5b4 100644
>>> --- a/drivers/pci/quirks.c
>>> +++ b/drivers/pci/quirks.c
>>> @@ -3029,6 +3029,20 @@ static void quirk_no_pm_reset(struct pci_dev *dev)
>>> DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_VENDOR_ID_ATI, PCI_ANY_ID,
>>> PCI_CLASS_DISPLAY_VGA, 8, quirk_no_pm_reset);
>>>
>>> +static void quirk_no_bus_reset(struct pci_dev *dev)
>>> +{
>>> + dev->dev_flags |= PCI_DEV_FLAGS_NO_BUS_RESET;
>>> +}
>>> +
>>> +/*
>>> + * Atheros AR93xx chips do not behave after a bus reset. The device will
>>> + * throw a Link Down error on AER capable system and regardless of AER,
>>> + * config space of the device is never accessible again and typically
>>> + * causes the system to hang or reset when access is attempted.
>>> + * http://www.spinics.net/lists/linux-pci/msg34797.html
>>> + */
>>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0030, quirk_no_bus_reset);
>>> +
>>> #ifdef CONFIG_ACPI
>>> /*
>>> * Apple: Shutdown Cactus Ridge Thunderbolt controller.
>>>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/