Re: XHCI vs PCM2903B/PCM2904 part 2

From: Rik van Riel
Date: Wed May 20 2020 - 15:53:36 EST


On Wed, 2020-05-20 at 16:50 +0300, Mathias Nyman wrote:
> On 20.5.2020 14.26, Rik van Riel wrote:
> > After a few more weeks of digging, I have come to the tentative
> > conclusion that either the XHCI driver, or the USB sound driver,
> > or both, fail to handle USB errors correctly.
> >
> > I have some questions at the bottom, after a (brief-ish)
> > explanation
> > of exactly what seems to go wrong.
> >
> > TL;DR: arecord from a misbehaving device can hang forever
> > after a USB error, due to poll on /dev/snd/timer never returning.
> >
> > The details: under some mysterious circumstances, the PCM290x
> > family sound chips can send more data than expected during an
> > isochronous transfer, leading to a babble error. Those
> > circumstances seem to in part depend on the USB host controller
> > and/or the electrical environment, since the chips work just
> > fine for most people.
> >
> > Receiving data past the end of the isochronous transfer window
> > scheduled for a device results in the XHCI controller throwing
> > a babble error, which moves the endpoint into halted state.
> >
> > This is followed by the host controller software sending a
> > reset endpoint command, and moving the endpoint into stopped
> > state, as specified on pages 164-165 of the XHCI specification.
>
> Note that isoch endpoints should generate buffer overrun instead of
> babble detect error on TD babble conditions.
> See xHCI spec 6.4.5 additional note 115.

Maybe it should, but I'm hitting this printk in handle_tx_event:

case COMP_BABBLE_DETECTED_ERROR:
xhci_dbg(xhci, "Babble error for slot %u ep %u on
endpoint\n",
slot_id, ep_index);
status = -EOVERFLOW;
break;

Tracing in the sound driver suggests that packet belongs to
the sound data device (not the sync device), because the
URB with -EOVERFLOW status for the isochronous packet in that
URB is the last one seen there.

> Or maybe a frame babble could halt an isoc endpoint, see xhci
> 4.10.2.4.1
> but then you should see a port error and port going to disabled
> state.
>
> Any logs of this?
>
> mount -t debugfs none /sys/kernel/debug
> echo 'module xhci_hcd =p' >/sys/kernel/debug/dynamic_debug/control
> echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control
> echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb
> echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable
> < trigger the issue >
> Send output of dmesg
> Send content of /sys/kernel/debug/tracing/trace

dmesg:
https://drive.google.com/open?id=1S2Qc8lroqA5-RMukuLBLWFGx10vEjG-i

usbtrace:
https://drive.google.com/open?id=1cbLcOnAtQRW0Chgak6PNC0l4yJv__4uO

> > However, the USB sound driver seems to have no idea that this
> > error happened. The function retire_capture_urb looks at the
> > status of each isochronous frame, but seems to be under the
> > assumption that the sound device just keeps on running.
> >
> > The function snd_complete_urb seems to only detect that the
> > device is not running if usb_submit_urb returns a failure.
> >
> > err = usb_submit_urb(urb, GFP_ATOMIC);
> > if (err == 0)
> > return;
> >
> > usb_audio_err(ep->chip, "cannot submit urb (err = %d)\n",
> > err);
> >
> > if (ep->data_subs && ep->data_subs->pcm_substream) {
> > substream = ep->data_subs->pcm_substream;
> > snd_pcm_stop_xrun(substream);
> > }
> >
> > However, the XHCI driver will happily submit an URB to a
> > stopped device. Looking at the call trace usb_submit_urb ->
> > xhci_urb_enqueue -> xhci_queue_isoc_tx_prepare -> prepare_ring,
> > you can see this code:
> >
> > /* Make sure the endpoint has been added to xHC schedule */
> > switch (ep_state) {
> > ...
> > case EP_STATE_HALTED:
> > xhci_dbg(xhci, "WARN halted endpoint, queueing URB
> > anyway.\n");
> > case EP_STATE_STOPPED:
> > case EP_STATE_RUNNING:
> > break;
> >
> > This leads me to a few questions:
> > - should retire_capture_urb call snd_pcm_stop_xrun,
> > or another function like it, if it sees certain
> > errors in the iso frame in the URB?
> > - should snd_complete_urb do something with these
> > errors, too, in case they happen on the sync frames
> > and not the data frames?
> > - does the XHCI code need to ring the doorbell when
> > submitting an URB to a stopped device, or is it
> > always up to the higher-level driver to fully reset
> > the device before it can do anything useful?
>
> xhci driver should ring the doorbell.
>
> xhci_queue_isoc_tx()
> giveback_first_trb()
> xhci_ring_ep_doorbell()
>
> when we are talking about babble or transaction errors the device
> might be
> completely unaware of the situation. Device side of the endpoint is
> not necessarily halted.
> xHC host will only halt its internal endpoint state, and it needs a
> reset
> endopoint command from the xhci driver to clear the internal halt
> state.

Interesting. I had no idea the endpoint on each side
was controlled separately.

--
All Rights Reversed.

Attachment: signature.asc
Description: This is a digitally signed message part