Re: [RFC] usb: host: xhci: Remove the watchdog timer and use command timer to watch stop endpoint command

From: Mathias Nyman
Date: Thu Dec 01 2016 - 08:27:44 EST


On 01.12.2016 06:54, Baolin Wang wrote:
On 30 November 2016 at 22:09, Mathias Nyman
<mathias.nyman@xxxxxxxxxxxxxxx> wrote:
On 30.11.2016 11:02, Baolin Wang wrote:

If the hardware never responds to the stop endpoint command, the
URBs will never be completed, and we might hang the USB subsystem.
The original watchdog timer is used to watch if one stop endpoint
command is timeout, if timeout, then the watchdog timer will set
XHCI_STATE_DYING, try to halt the xHCI host, and give back all
pending URBs.

But now we already have one command timer to control command timeout,
thus we can also use the command timer to watch the stop endpoint
command, instead of one duplicate watchdog timer which need to be
removed.

Meanwhile we don't need the 'stop_cmds_pending' flag to identy if
this is the last stop endpoint command of one endpoint. Since we
can make sure we only set one stop endpoint command for one endpoint
by 'EP_HALT_PENDING' flag in xhci_urb_dequeue() function. Thus remove
this flag.

We also need to clean up the command queue before trying to halt the
xHCI host in xhci_stop_endpoint_command_timeout() function.


This isn't a bad idea.

There are anyway some corner cases and details that need to be
checked, such as suspend (which will clear the command queue), module unload
and abrupt host removal (like pci hotplug removal of host controller)
we need to make sure we can trust the command timer to always return the
canceled URB

Yes, you are right, we need to check these carefully.

Suspend process, module unload and abrupt host removal, they all will
issue usb_disconnect() firstly before clear the command queue, it will
check URBs for every endpoint by
usb_disconnect()--->usb_disable_device()--->usb_disable_endpoint(),
which will make sure every URBs of endpoints will be cancelled by the
stop endpoint command responding or the timeout function of stop
endpoint command (xhci_stop_endpoint_command_timeout()) in
usb_hcd_flush_endpoint(). From that point, we can make sure the
command timer will be useful to handle stop endpoint command timeout.
Please correct me if I said something wrong. Thanks.


This relies on current queued command that times out to be the stop endpoint command.

If host partially, or completely hangs there might be any number of commands in the
command queue, and we would need to wait for each one of them to timeout, finish
before we finally get to the stop endpoint command, and give back the urb.

I think it would be better to first fix the issues with the current watchdog function, get
those fixes into stable, and then think about moving to the command queue timer.

In short, this patch doesn't currently fix any existing issue, but might cause the
timeout to be more unreliable

-Mathias