Re: OHCI unplug kernel crash in kernel 4.3 and 4.4

From: Alan Stern
Date: Thu Jan 14 2016 - 11:30:13 EST


On Thu, 14 Jan 2016, Joerg Roedel wrote:

> On Thu, Jan 14, 2016 at 03:01:14PM +0100, Oliver Neukum wrote:
> > On Thu, 2016-01-14 at 13:50 +0100, Stefani Seibold wrote:
> > > A unplug of an USB 1.0 OHCI controller express card will result in a
> > > kernel crash. The express card is attached via thunderbolt and a sonnet
> > > express card to thunderbolt adapter. The computer hangs after the
> > > unplug, only a power fix the situation.
> > >
> > > This is the kernel log of a kernel 4.4 via netconsole:
> > >
> > > pciehp 0000:06:03.0:pcie24: Card not present on Slot(3)
> > > pciehp 0000:06:03.0:pcie24: slot(3): Link Down event
> > > pciehp 0000:06:03.0:pcie24: Link Down event ignored on slot(3): already powering off
> > > ehci-pci 0000:0b:00.2: HC died; cleaning up
> > > ehci-pci 0000:0b:00.2: remove, state 4
> > > usb usb5: USB disconnect, device number 1
> > > pciehp 0000:00:1c.4:pcie04: Card not present on Slot(4)
> > > pciehp 0000:00:1c.4:pcie04: slot(4): Link Down event
> > > ehci-pci 0000:0b:00.2: USB bus 5 deregistered
> > > ohci-pci 0000:0b:00.1: HC died; cleaning up
> > > ohci-pci 0000:0b:00.1: remove, state 4
> > > usb usb7: USB disconnect, device number 1
> > > pciehp 0000:00:1c.4:pcie04: Link Down event ignored on slot(4): already powering off
> > > ohci-pci 0000:0b:00.1: USB bus 7 deregistered
> > > ohci-pci 0000:0b:00.0: HC died; cleaning up
> > > ohci-pci 0000:0b:00.0: remove, state 4
> > > usb usb6: USB disconnect, device number 1
> > > ------------[ cut here ]------------
> > > kernel BUG at drivers/iommu/intel-iommu.c:3592!
> >
> > This is likely the crucial information. The IOMMU is unhappy.
>
> The only explanation for this is that the device driver calls into the
> iommu code after the BUS_NOTIFY_REMOVED_DEVICE notifier ran. I see in
> the stack-trace below that device_release_driver() is called from an
> unusual place in pci code.
>
> Unless there is a good explanation for calling device_release_driver()
> after the BUS_NOTIFY_REMOVED_DEVICE notifier, this is no bug in the
> iommu code.

I don't think that is what happened. The BUS_NOTIFY_REMOVED_DEVICE
notifier is sent by device_del(), which is called (indirectly) by
pci_remove_bus_device(). The device driver's remove routine -- which
is what called into the iommu code -- is invoked through
device_release_driver(), which is called (indirectly) by
pci_stop_bus_device().

Since pci_stop_and_remove_bus_device() calls pci_stop_bus_device()
before pci_remove_bus_device(), the notifier does not get sent until
after the iommu code runs.

Alan Stern