Re: TI PCIe xHCI and kexec

From: Joel Stanley
Date: Wed Feb 05 2020 - 22:38:15 EST


On Wed, 5 Feb 2020 at 09:35, Mathias Nyman
<mathias.nyman@xxxxxxxxxxxxxxx> wrote:
>
> On 5.2.2020 2.55, Joel Stanley wrote:
> > I'm supporting a system that uses Linux-as-a-bootloader to load a
> > distro kernel via kexec, The systems have a TI TUSB73x0 PCIe
> > controller which goes out to lunch after a kexec. This is the distro
> > (post-kexec) kernel:
> >
> > [ 0.235411] pci 0003:01:00.0: xHCI HW did not halt within 16000
> > usec status = 0x0
> > [ 1.037298] xhci_hcd 0003:01:00.0: xHCI Host Controller
> > [ 1.037367] xhci_hcd 0003:01:00.0: new USB bus registered, assigned
> > bus number 1
> > [ 1.053481] xhci_hcd 0003:01:00.0: Host halt failed, -110
> > [ 1.053523] xhci_hcd 0003:01:00.0: can't setup: -110
> > [ 1.053565] xhci_hcd 0003:01:00.0: USB bus 1 deregistered
> > [ 1.053629] xhci_hcd 0003:01:00.0: init 0003:01:00.0 fail, -110
> > [ 1.053703] xhci_hcd: probe of 0003:01:00.0 failed with error -110
> >
> > There were some fixes made a few years back to improve the situation,
> > but we've still had to carry some form of the patch below in the
> > bootloader kernel. I would like to rework it so it can be merged.
> >
> > diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
> > index dbac0fa9748d..eaa94456dd9d 100644
> > --- a/drivers/usb/host/xhci.c
> > +++ b/drivers/usb/host/xhci.c
> > @@ -789,6 +789,9 @@ void xhci_shutdown(struct usb_hcd *hcd)
> > xhci_dbg_trace(xhci, trace_xhci_dbg_init,
> > "xhci_shutdown completed - status = %x",
> > readl(&xhci->op_regs->status));
> > +
> > + /* TI XHCI controllers do not come back after kexec without this hack */
> > + pci_reset_function_locked(to_pci_dev(hcd->self.sysdev));
> > }
> > EXPORT_SYMBOL_GPL(xhci_shutdown);
> >
> > I would like some advice on how to implement it in a way that is
> > acceptable. Would a quirk on the pci id in xhci_shutdown be ok?
>
> Yes, but as this is a pci specific workaround the quirk should go to
> xhci-pci.c: xhci_pci_shutdown(), which was added in v5.5
>
> Is the rootcause known?
> Is the only possible solution to reset the pci function?.

I don't know the root cause. The people that helped debug it in the
first place have moved on.

> Have you tried, or seen this issue on any other controller than this TUSB73x0?

We don't have any systems with a different USB controller.

In general, the other PCie devices in the system are well (enough)
behaved to survive kexec. We don't have any other out of tree
workarounds.

>
> >
> > 0003:01:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB
> > 3.0 xHCI Host Controller (rev 02)
> >
> > The full debug log of the distro kernel booting is below.
> >
> > [ 1.037833] xhci_hcd 0003:01:00.0: USBCMD 0x0:
> > [ 1.037835] xhci_hcd 0003:01:00.0: HC is being stopped
> > [ 1.037837] xhci_hcd 0003:01:00.0: HC has finished hard reset
> > [ 1.037839] xhci_hcd 0003:01:00.0: Event Interrupts disabled
> > [ 1.037841] xhci_hcd 0003:01:00.0: Host System Error Interrupts disabled
> > [ 1.037843] xhci_hcd 0003:01:00.0: HC has finished light reset
> > [ 1.037846] xhci_hcd 0003:01:00.0: USBSTS 0x0:
> > [ 1.037847] xhci_hcd 0003:01:00.0: Event ring is empty
> > [ 1.037849] xhci_hcd 0003:01:00.0: No Host System Error
> > [ 1.037851] xhci_hcd 0003:01:00.0: HC is running
>
> Hmm, all bits in both USBCMD and USBSTS are 0. This is a bit suspicious.
> Normally at least USBCMD Run/Stop bit, and USBSTS HCHalted bit have
> opposite values.

Does this suggest the controller is not responding at all?