Re: [PATCH v2] usb: dwc3: gadget: Avoid canceling current request for queuing error

From: Wesley Cheng
Date: Wed May 05 2021 - 14:02:09 EST




On 5/5/2021 8:19 AM, Alan Stern wrote:
> On Wed, May 05, 2021 at 03:57:03PM +0300, Felipe Balbi wrote:
>>
>> Hi,
>>
>> Wesley Cheng <wcheng@xxxxxxxxxxxxxx> writes:
>>> On 5/3/2021 7:20 PM, Thinh Nguyen wrote:
>>>> Hi,
>>>>
>>>> Wesley Cheng wrote:
>>>>> If an error is received when issuing a start or update transfer
>>>>> command, the error handler will stop all active requests (including
>>>>> the current USB request), and call dwc3_gadget_giveback() to notify
>>>>> function drivers of the requests which have been stopped. Avoid
>>>>> having to cancel the current request which is trying to be queued, as
>>>>> the function driver will handle the EP queue error accordingly.
>>>>> Simply unmap the request as it was done before, and allow previously
>>>>> started transfers to be cleaned up.
>>>>>
>>>
>>> Hi Thinh,
>>>
>>>>
>>>> It looks like you're still letting dwc3 stopping and cancelling all the
>>>> active requests instead letting the function driver doing the dequeue.
>>>>
>>>
>>> Yeah, main issue isn't due to the function driver doing dequeue, but
>>> having cleanup (ie USB request free) if there is an error during
>>> usb_ep_queue().
>>>
>>> The function driver in question at the moment is the f_fs driver in AIO
>>> mode. When async IO is enabled in the FFS driver, every time it queues
>>> a packet, it will allocate a io_data struct beforehand. If the
>>> usb_ep_queue() fails it will free this io_data memory. Problem is that,
>>> since the DWC3 gadget calls the completion with -ECONNRESET, the FFS
>>> driver will also schedule a work item (within io_data struct) to handle
>>> the completion. So you end up with a flow like below
>>>
>>> allocate io_data (ffs)
>>> --> usb_ep_queue()
>>> --> __dwc3_gadget_kick_transfer()
>>> --> dwc3_send_gadget_ep_cmd(EINVAL)
>>> --> dwc3_gadget_ep_cleanup_cancelled_requests()
>>> --> dwc3_gadget_giveback(ECONNRESET)
>>> ffs completion callback
>>> queue work item within io_data
>>> --> usb_ep_queue returns EINVAL
>>> ffs frees io_data
>>> ...
>>>
>>> work scheduled
>>> --> NULL pointer/memory fault as io_data is freed

Hi Alan,
>
> Am I reading this correctly? It looks like usb_ep_queue() returns a
> -EINVAL error, but then the completion callback gets invoked with a
> status of -ECONNRESET.
>
Yes, your understanding of the current behavior is correct. In this
situation we'd get a completion callback with ECONNRESET, and then also
return EINVAL to usb_ep_queue().

>> I have some vague memory of discussing (something like) this with Alan
>> Stern long ago and the conclusion was that the gadget driver should
>> handle cases such as this.
>
> Indeed, this predates the creation of the Gadget API; the same design
> principle applies to the host-side API. It's a very simple idea:
>
> If an URB or usb_request submission succeeds, it is guaranteed
> that the completion routine will be called when the transfer is
> finished, cancelled, aborted, or whatever (note that this may
> happen before the submission call returns).
>
> If an URB or usb_request submission fails, it is guaranteed that
> the completion routine will not be called.
>
> So if dwc3 behaves as described above (usb_ep_queue() fails _and_ the
> completion handler is called), this is a bug

Thanks for this. So we're trying to only allow one or the other to
happen right now. (either completion with an error, or returning error
for usb_ep_queue()) I think that would be OK then.

Thanks
Wesley Cheng

>
> Alan Stern
>
>> OTOH, we're returning failure during
>> usb_ep_queue() which tells me there's something with dwc3 (perhaps not
>> exclusively, but that's yet to be shown).
>>
>> If I understood the whole thing correctly, we want everything except the
>> current request (the one that failed START or UPDATE transfer) to go
>> through giveback(). This really tells me that we're not handling error
>> case in kick_transfer and/or prepare_trbs() correctly.
>>
>> I also don't want to pass another argument to kick_transfer because it
>> should be unnecessary: the current request should *always* be the last
>> one in the list. Therefore we should rely on something like
>> list_last_entry() followed by list_for_each_entry_safe_reverse() to
>> handle this without a special case.
>>
>> ret = dwc3_send_gadget_ep_cmd();
>> if (ret < 0) {
>> current = list_last_entry();
>>
>> unmap(current);
>> for_each_trb_in(current) {
>> clear_HWO(trb);
>> }
>>
>> list_for_entry_safe_reverse() {
>> move_cancelled();
>> }
>> }
>>
>> --
>> balbi
>
>

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project