Re: [PATCH] xhci: clear port_remote_wakeup after resume failure

From: Nicolas Saenz Julienne
Date: Sat Jun 08 2019 - 09:38:01 EST


On Tue, 2019-06-04 at 16:53 +0300, Mathias Nyman wrote:
> On 27.5.2019 14.28, Nicolas Saenz Julienne wrote:
> > Hi Matthias,
> > thanks for the review.
> >
> > On Mon, 2019-05-27 at 14:16 +0300, Mathias Nyman wrote:
> > > On 24.5.2019 17.52, Nicolas Saenz Julienne wrote:
> > > > This was seen on a Dell Precision 5520 using it's WD15 dock. The dock's
> > > > Ethernet device interfaces with the laptop through one of it's USB3
> > > > ports. While idle, the Ethernet device and HCD are suspended by runtime
> > > > PM, being the only device connected on the bus. Then, both are resumed
> > > > on
> > > > behalf of the Ethernet device, which has remote wake-up capabilities.
> > > >
> > > > The Ethernet device was observed to randomly disconnect from the USB
> > > > port shortly after submitting it's remote wake-up request. Probably a
> > > > weird timing issue yet to be investigated. This causes runtime PM to
> > > > busyloop causing some tangible CPU load. The reason is the port gets
> > > > stuck in the middle of a remote wake-up operation, waiting for the
> > > > device to switch to U0. This never happens, leaving "port_remote_wakeup"
> > > > enabled, and automatically triggering a failure on any further suspend
> > > > operation.
> > > >
> > > > This patch clears "port_remote_wakeup" upon detecting a device with a
> > > > wrong resuming port state (see Table 4-9 in 4.15.2.3). Making sure the
> > > > above mentioned situation doesn't trigger a PM busyloop.
> > > >
> > >
> > > There was a similar case where the USB3 link had transitioned to a
> > > lower power U1 or U2 state after resume, before driver read the state,
> > > leaving port_remote_wakeup flag uncleared.
> > >
> > > This was fixed in 5.1 kernel by commit:
> > >
> > > 6cbcf59 xhci: Fix port resume done detection for SS ports with LPM enable
> > >
> > > Can you check if you have it?
> > > It should be in recent stable releases as well.
> >
> > I was aware of that patch, unfortunately it doesn't address the same issue.
> > In
> > my case I never get a second port status event (so no PLC == 1 or any state
> > change seen in PLS). The device simply disconnects from the bus.
> >
>
> I see, ok, then we need to clear the flag in the hub thread.
>
> But to me it looks like this patch could cause a small race risk in the
> successful
> device initiated resume cases.
>
> If the hub thread, i.e. the get_port_status() function, notices the U0 state
> before
> the interrupt handler, i.e. handle_port_status() function, then
> port_remote_wakeup
> flag is cleared in the hub thread and the wakeup notification is never called
> from
> handle_port_status().
>
> Would it be enough to just check for (port_remote_wakeup flag &&
> !PORT_CONNECT) in the hub thread?
> USB3 PORT_CONNECT bit is lost in most error cases.

I get your concerns, there is a race indeed. On top of that, checking
PORT_CONNECT works fine for me.

So if I undertood your suggestion right, would this be fine?

diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c
index 3abe70ff1b1e..253957dc62de 100644
--- a/drivers/usb/host/xhci-hub.c
+++ b/drivers/usb/host/xhci-hub.c
@@ -1057,6 +1057,9 @@ static u32 xhci_get_port_status(struct usb_hcd *hcd,
bus_state->resume_done[wIndex] = 0;
clear_bit(wIndex, &bus_state->resuming_ports);
usb_hcd_end_port_resume(&hcd->self, wIndex);
+ } else if (bus_state->port_remote_wakeup & (1 << port->hcd_portnum) &&
+ !(raw_port_status & PORT_CONNECT)) {
+ bus_state->port_remote_wakeup &= ~(1 << port->hcd_portnum);
}

if (bus_state->port_c_suspend & (1 << wIndex))

Regards,
Nicolas

Attachment: signature.asc
Description: This is a digitally signed message part