Re: [PATCH v2] xhci: fix runtime PM imbalance in USB2 resume

From: Mathias Nyman
Date: Tue Mar 01 2022 - 10:47:50 EST


On 1.3.2022 12.28, Henry Lin wrote:
>>> USB2 resume starts with usb_hcd_start_port_resume() in port status
>>> change handling for RESUME link state. usb_hcd_end_port_resume() call is
>>> needed to keep runtime PM balance.
>
>> For normal usb2 port resume the usb_hcd_end_port_resume() is called when resume
>> has been signaled for long enough in xhci_handle_usb2_port_link_resume().
>>
>> This is also where driver directs the port to go from Resume state to U0.
>> Port can't do this without driver directing it.
>>
>> If there's a failure during resume signaling (disconnect, reset, error) then
>> stale resume variables are detected in xhci_get_port_status() and
>> usb_hcd_end_port_resume() is called.
>
>> I do now see a231ec41e6f6 ("xhci: refactor U0 link state handling in get_port_status")
>> does change order of checking and clearing stale resume variables, but this should
>> only happen if the first port state we read is a fully enabled functional U0 state after
>> a failed resume.
>
>> Could you expand a bit how this was detected?
> We observed the racing issue when usb2 device-initiated resume occurs in system resume.
> If usb2 host-initiated resume for system resume directs U0 before xhci_get_usb2_port_status()
> see RESUME state, xhci_get_usb2_port_status() will not finish resume process in
> xhci_handle_usb2_port_link_resume(). Its scenario is as follows:
>
> 1. System resume starts. All driver system resume callbacks get called in order. XHCI controller
> is resumed by xhci_resume().
> 2. USB2 root hub is resuming. hcd_bus_resume() is being executed.
> 3. Before xhci_bus_resume() is finished, XHCI driver receives a port status change event for
> an USB2 port with RESUME link state in xhci_irq(). XHCI driver starts the process to resume
> HS port for device-initiated resume.
> 4. In xhci_bus_resume(), host-initiated resume (direct U0) is performed on the same port that is
> resuming in step 3 in below loop:
>
> if (bus_state->bus_suspended) {
> spin_unlock_irqrestore(&xhci->lock, flags);
> msleep(USB_RESUME_TIMEOUT);
> spin_lock_irqsave(&xhci->lock, flags);
> }
> for_each_set_bit(port_index, &bus_state->bus_suspended,
> BITS_PER_LONG) {
> /* Clear PLC to poll it later for U0 transition */
> xhci_test_and_clear_bit(xhci, ports[port_index],
> PORT_PLC);
> xhci_set_link_state(xhci, ports[port_index], XDEV_U0);
> }
> 5. Then, link state of the resuming port is observed as U0 in following
> xhci_get_usb2_port_status(). xhci_handle_usb2_port_link_resume() has
> no chance to get called on the resuming port.
>

True, thanks for the explanation.
If there's a race between system resume and device-initiated resume, and port is resumed
in xhci_bus_resume() then yes I see how this could happen.

Maybe only call usb_hcd_end_port_resume() if xhci_irq() detected the device-initiated
resume. i.e. set a value to resume_done[portnum] and called usb_hcd_start_port_resume()
something like:

@@ -1088,6 +1088,8 @@ static void xhci_get_usb2_port_status(struct xhci_port *port, u32 *status,
if (link_state == XDEV_U2)
*status |= USB_PORT_STAT_L1;
if (link_state == XDEV_U0) {
+ if (bus_state->resume_done[portnum])
+ usb_hcd_end_port_resume(&port->rhub->hcd->self, portnum);
bus_state->resume_done[portnum] = 0;
clear_bit(portnum, &bus_state->resuming_ports);

Also xhci_bus_resume() only resumes ports that were forcefully suspended to U3 in xhci_bus_suspend().
Could be worth checking why that device wasn't already in U3 when suspend reached xhci_bus_suspend().

Thanks
Mathias