Re: [PATCH] usb: host: xhci: Replace bus lock with host controller lock

From: Chris Bainbridge
Date: Thu Feb 04 2016 - 17:06:35 EST


On Thu, Feb 04, 2016 at 04:00:51PM -0500, Alan Stern wrote:
> On Thu, 4 Feb 2016, Chris Bainbridge wrote:
>
> > The XHCI controller presents two USB buses to the system - one for USB 2
> > and one for USB 3. When only one bus is locked there is a race condition
> > during hub init that results in errors like:
> >
> > [ 13.183701] usb 3-3: device descriptor read/all, error -110
>
> What exactly is the race condition? Why does locking both buses fix
> it?

[ 2.692571] xhci_hcd 0000:00:14.0: xHCI Host Controller
[ 2.693279] xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 3
[ 2.694867] xhci_hcd 0000:00:14.0: hcc params 0x20007181 hci version 0x100 quirks 0x0000b930
[ 2.694880] xhci_hcd 0000:00:14.0: cache line size of 256 is not supported
[ 2.695995] usb usb3: New USB device found, idVendor=1d6b, idProduct=0002
[ 2.696005] usb usb3: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[ 2.696016] usb usb3: Product: xHCI Host Controller
[ 2.696024] usb usb3: Manufacturer: Linux 4.4.0 xhci-hcd
[ 2.696031] usb usb3: SerialNumber: 0000:00:14.0
[ 2.698414] hub 3-0:1.0: USB hub found
[ 2.704723] xhci_hcd 0000:00:14.0: xHCI Host Controller
[ 2.705502] xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 4
[ 2.706136] usb usb4: New USB device found, idVendor=1d6b, idProduct=0003
[ 2.706138] usb usb4: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[ 2.706139] usb usb4: Product: xHCI Host Controller
[ 2.706140] usb usb4: Manufacturer: Linux 4.4.0 xhci-hcd
[ 2.706141] usb usb4: SerialNumber: 0000:00:14.0
[ 2.708467] hub 4-0:1.0: USB hub found
[ 3.021779] usb 3-3: new high-speed USB device number 2 using xhci_hcd
[ 8.034843] xhci_hcd 0000:00:14.0: Timeout while waiting for setup device command
...bus4 enumeration...
[ 13.297835] usb 3-3: device descriptor read/all, error -110
...bus3 enumeration...

Note there seem to be two timeouts of 5 seconds/10 seconds after the
last message at 3.021779 seconds.

hub_port_init is called in parallel for both buses.
The first thread is in usb_get_device_descriptor when the second one
enters the function and calls the code to get an address. I don't know
precisely how it fails - it looks like the functions for doing the
initialisation are synchronous and sleeping waiting for a response and
that gets disrupted when the second thread tries to initialise the hub.
What was the basis for using a lock on the bus rather than the
controller? Does the spec say that buses of the same controller can be
initialised in parallel? Mathias previously said:

> Just found an additional note in the xhci specs section 4.5.3 saying that:
> "Note: Software shall not transition more than one Device Slot to the Default State at a time"
> which is what xhci_setup_device() does in addition to moving slots to the addressed state

But I don't know if that means you can do the reset/set address/read
descriptors in parallel?


> > @@ -4312,7 +4312,7 @@ hub_port_init(struct usb_hub *hub, struct usb_device *udev, int port1,
> > if (oldspeed == USB_SPEED_LOW)
> > delay = HUB_LONG_RESET_TIME;
> >
> > - mutex_lock(&hdev->bus->usb_address0_mutex);
> > + mutex_lock(&hdev->bus->controller->mutex);
> >
> > /* Reset the device; full speed may morph to high speed */
> > /* FIXME a USB 2.0 device may morph into SuperSpeed on reset. */
> > @@ -4588,7 +4588,7 @@ fail:
> > hub_port_disable(hub, port1, 0);
> > update_devnum(udev, devnum); /* for disconnect processing */
> > }
> > - mutex_unlock(&hdev->bus->usb_address0_mutex);
> > + mutex_unlock(&hdev->bus->controller->mutex);
> > return retval;
> > }
>
> I don't think this is a good idea. The driver core needs to be able to
> access the controller while this function is running. You can
> introduce a new mutex if you want, perhaps in the primary hcd
> structure, but don't use bus->controller->mutex.

An explicit lock might be a good idea. I was trying to avoid adding
another lock so used the one in struct device as it appeared unused. The
XHCI code seems to only use the lock in struct xhci_hcd and ehci uses
struct ehci->lock.

btw I think this bug may be the same as reported at
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1437492