RE: xhci_pci & PCIe hotplug crash

From: David Laight
Date: Wed May 05 2021 - 11:20:18 EST

Next message: Benjamin Gaignard: "Re: [PATCH v10 6/9] media: uapi: Add a control for HANTRO driver"
Previous message: Matthew Wilcox (Oracle): "[PATCH v9 11/96] mm/vmstat: Add functions to account folio statistics"
In reply to: Pali Rohár: "Re: xhci_pci & PCIe hotplug crash"
Next in thread: Pali Rohár: "Re: xhci_pci & PCIe hotplug crash"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Pali Rohár
> Sent: 05 May 2021 14:03
...
> I already figured out that CPU receive external abort also when trying
> to issue a new PIO transfer for accessing PCI config space while
> previous transfer has not finished yet. And also there is no way (at
> least in documentation) which allows to "mask" this external abort. But
> this issue can be fixed in pci-aardvark.c driver to disallow access to
> config space while previous transfer is still running (I will send patch
> for this one).

My the sound of the above you need to put a global spinlock around
all PCIe config space accesses.

Is this the horrid hardware that can't do a 'normal' PCIe transfer
while a config space access is in progress?
If that it true then you have bigger problems.
Especially if it is an SMP system.

> So seems that PCIe controller HW triggers these external aborts when
> device on PCIe bus is not accessible anymore.
>
> If this issue is really caused by MMIO access from xhci driver when
> device is not accessible on the bus anymore, can we do something to
> prevent this kernel crash? Somehow mask that external abort in kernel
> for a time during MMIO access?

If it is a cycle abort then the interrupted address is probably
that of the MMIO instruction.
So you need to catch the abort, emulate the instruction and
then return to the next one.

This probably requires an exception table containing the address
of every readb/w/l() instruction.

If you get a similar error on writes it is likely to be a few
instructions after the actual writeb/w/l() instruction.
Write are normally 'posted' and asynchronous.

If you are really lucky you can get enough state out of the
abort handler to fixup/ignore the cycle without an
exception table.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Next message: Benjamin Gaignard: "Re: [PATCH v10 6/9] media: uapi: Add a control for HANTRO driver"
Previous message: Matthew Wilcox (Oracle): "[PATCH v9 11/96] mm/vmstat: Add functions to account folio statistics"
In reply to: Pali Rohár: "Re: xhci_pci & PCIe hotplug crash"
Next in thread: Pali Rohár: "Re: xhci_pci & PCIe hotplug crash"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]