Re: [RFC RESEND PATCH 0/1] USB EHCI: repeated resets on full and low speed devices

From: Alan Stern
Date: Tue Sep 01 2020 - 15:52:04 EST


On Tue, Sep 01, 2020 at 11:00:16AM -0600, Khalid Aziz wrote:
> On 9/1/20 10:36 AM, Alan Stern wrote:
> > On Tue, Sep 01, 2020 at 09:15:46AM -0700, Khalid Aziz wrote:
> >> On 8/31/20 8:31 PM, Alan Stern wrote:
> >>> Can you collect a usbmon trace showing an example of this problem?
> >>>
> >>
> >> I have attached usbmon traces for when USB hub with keyboards and mouse
> >> is plugged into USB 2.0 port and when it is plugged into the NEC USB 3.0
> >> port.
> >
> > The usbmon traces show lots of errors, but no Clear-TT events. The
> > large number of errors suggests that you've got a hardware problem;
> > either a bad hub or bad USB connections.
>
> That is what I thought initially which is why I got additional hubs and
> a USB 2.0 PCI card to test. I am seeing errors across 3 USB controllers,
> 4 USB hubs and 4 slow/full speed devices. All of the hubs and slow/full
> devices work with zero errors on my laptop. My keyboard/mouse devices
> and 2 of my USB hubs predate motherboard update and they all worked
> flawlessly before the motherboard upgrade. Some combinations of these
> also works with no errors on my desktop with new motherboard that I had
> listed in my original email:

It's a very puzzling situation.

One thing which probably would work well, surprisingly, would be to buy
an old USB-1.1 hub and plug it into the PCI card. That combination is
likely to be similar to what you see when plugging the devices directly
into the PCI card. It might even work okay with the USB-3 controllers.

> 2. USB 2.0 controller - WORKS
> 5. USB 3.0/3.1 controller -> Bus powered USB 2.0 hub - WORKS
>
> I am not seeing a common failure here that would point to any specific
> hardware being bad. Besides, that one code change (which I still can't
> say is the right code change) in ehci-q.c makes USB 2.0 controller work
> reliably with all my devices.

The USB and EHCI designs are flawed in that under the circumstances
you're seeing, they don't have any way to tell the difference between a
STALL and a host timing error. The current code treats these situations
as timing/transmission errors (resulting in device resets); your change
causes them to be treated as STALLs. However, there are known, common
situations in which those same symptoms really are caused by
transmission errors, so we don't want to start treating them as STALLs.

Besides, I suspect that your code change does _not_ make the USB-2
controller work reliably with your devices. You should collect a usbmon
trace under those conditions; I predict it will be full of STALLs. And
furthermore, I believe these STALLs will not show up in a usbmon trace
made with the devices plugged directly into the PCI card. If I'm right
about these things, the errors are still present even with your patch;
all it does is hide them.

Short of a USB bus analyzer, however, there's no way to tell what's
really going on.

Alan Stern