RE: [PATCH v7 01/10] usb: gadget: udc: Add timer support for usb requests

From: Anurag Kumar Vulisha
Date: Mon Dec 03 2018 - 11:08:32 EST



Hi Alan,

>-----Original Message-----
>From: Alan Stern [mailto:stern@xxxxxxxxxxxxxxxxxxx]
>Sent: Monday, December 03, 2018 8:21 PM
>To: Anurag Kumar Vulisha <anuragku@xxxxxxxxxx>
>Cc: Felipe Balbi <balbi@xxxxxxxxxx>; Greg Kroah-Hartman
><gregkh@xxxxxxxxxxxxxxxxxxx>; Shuah Khan <shuah@xxxxxxxxxx>; Johan Hovold
><johan@xxxxxxxxxx>; Jaejoong Kim <climbbb.kim@xxxxxxxxx>; Benjamin
>Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx>; Roger Quadros <rogerq@xxxxxx>; Manu
>Gautam <mgautam@xxxxxxxxxxxxxx>; martin.petersen@xxxxxxxxxx; Bart Van
>Assche <bvanassche@xxxxxxx>; Mike Christie <mchristi@xxxxxxxxxx>; Matthew
>Wilcox <willy@xxxxxxxxxxxxx>; Colin Ian King <colin.king@xxxxxxxxxxxxx>; linux-
>usb@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; v.anuragkumar@xxxxxxxxx;
>Thinh Nguyen <thinhn@xxxxxxxxxxxx>; Tejas Joglekar
><tejas.joglekar@xxxxxxxxxxxx>; Ajay Yugalkishore Pandey <APANDEY@xxxxxxxxxx>
>Subject: RE: [PATCH v7 01/10] usb: gadget: udc: Add timer support for usb requests
>
>On Mon, 3 Dec 2018, Anurag Kumar Vulisha wrote:
>
>> >First of all, if some sort of deadlock causes a transfer to fail to
>> >complete, the host is expected to cancel and restart it. Not the
>> >gadget.
>> >
>>
>> Thanks for spending your time in reviewing this patch. The deadlock
>> is a very rare case scenario and is happening because both the gadget
>> controller & host controllers get out of sync and are stuck waiting for the
>> relevant event. For example this issue is observed in stream protocol where
>> the gadget controller is waiting on Host controller to issue PRIME transaction
>> and Host controller is waiting on gadget to issue ERDY transaction. Since
>> the stream protocol is gadget driven, the host may not proceed further until it
>> receives a valid Start Stream (ERDY) transaction from gadget.
>
>That's not entirely true. Can't the host cancel the transfer and then
>restart it?
>

Yes the host can cancel the transfer. This issue originated from the endpoints using bulk
streaming protocol and may not occur with normal endpoints. AFAIK bulk streaming is
gadget driven, where the gadget is allowed to select which stream id transfer the host
should work on . Since the host doesn't have control on when the transfer would be
selected by gadget, it may wait for longer timeouts before cancelling the transfer.

>> Since the gadget
>> controller driver is aware that the controller is stuck , makes it responsible
>> to recover the controller from hang condition by restarting the transfer (which
>> triggers the controller FSM to issue ERDY to host).
>
>Isn't there a cleaner way to recover than by cancelling the request and
>resubmitting it?
>

dequeuing the request issues the stop transfer command to the controller, which
cleans all the hardware resource allocated for that endpoint. This also resets the
hardware FSMs for that endpoint . So, when re-queuing of the transfer happens
the controller starts allocating hardware resources again, thus avoiding the probability
of entering into the issue. I am not sure of other controllers, but for dwc3, issuing
the stop transfer is the only way to handle this issue.

@Felipe: Can you please provide your suggestion on this.

>> >Second, if a request timer expires and the request is cancelled, the
>> >gadget driver's completion handler will be called. This is not what
>> >you want if the UDC core is going to resubmit the request
>> >automatically.
>> >
>> >Third, if a request timer expires and the timer handler calls
>> >usb_ep_dequeue() followed immediately by usb_ep_queue_timeout(), the
>> >resubmit will probably fail because the dequeue won't have completed
>> >yet.
>> >
>> >Fourth, the patch contains a race between the timer expiring and the
>> >request completing.
>>
>> Thanks for correcting, I agree with you on all the above 3 cases that the
>> resubmission of the request should only be done from the class driver and
>> the udc core should simply dequeue the request on timeout. I am not sure
>> why I haven't seen any issue while testing on this patch series. I will modify
>> the code to handle the resubmitting of requests properly.
>
>How can the gadget driver know what timeout to use? The host is
>allowed to be as slow as it wants; the gadget driver doesn't have any
>way to tell when the host wants to start the transfer.

Yes , I agree with you that the timeout may vary from usage to usage. This timeout
should be decided by the class driver which queues the request. As discussed above
this issue was observed in streaming protocol , which is very much faster than normal
BOT modes and it works on super speed . More over the gadget controller decides
the selection of the stream id on which the host should work , so taking all these into
consideration I kept 50ms timeout for stream transfers, so that the performance may
not get decreased.

Thanks,
Anurag Kumar Vulisha