Re: [PATCH v2] usb: dwc3: Prevent indefinite sleep in _dwc3_set_mode during suspend/resume

From: Felipe Balbi
Date: Fri Mar 16 2018 - 08:25:24 EST



Hi,

Minas Harutyunyan <Minas.Harutyunyan@xxxxxxxxxxxx> writes:
>>>> On 09/03/18 14:47, Roger Quadros wrote:
>>>>> In the following test we get stuck by sleeping forever in _dwc3_set_mode()
>>>>> after which dual-role switching doesn't work.
>>>>>
>>>>> On dra7-evm's dual-role port,
>>>>> - Load g_zero gadget driver and enumerate to host
>>>>> - suspend to mem
>>>>> - disconnect USB cable to host and connect otg cable with Pen drive in it.
>>>>> - resume system
>>>>> - we sleep indefinitely in _dwc3_set_mode due to.
>>>>> dwc3_gadget_exit()->usb_del_gadget_udc()->udc_stop()->
>>>>> dwc3_gadget_stop()->wait_event_lock_irq()
>>>>>
>>>>> To fix this instead of waiting indefinitely with wait_event_lock_irq()
>>>>> we use wait_event_interruptible_lock_irq_timeout() and print
>>>>> and error message if there was a timeout.
>>>>>
>>>>> Signed-off-by: Roger Quadros <rogerq@xxxxxx>
>>>>
>>>> Thanks for picking this for -next.
>>>> Is it better to have this in v4.16-rc fixes?
>>>> and also stable? v4.12+
>>>
>>> Well, there was no "Fixes: foobar" or "Cc: stable" lines in the commit
>>> log ;-)
>>>
>>> The best we can do now, is wait for -rc1 and manually send the commit to
>>> stable.
>>>
>>
>> That's fine. Thanks.
>>
>
> Same issue seen in dwc3_gadget_ep_dequeue() function where also used
> wait_event_lock_irq() - as result infinite loop.

how did this happen? During rmmod dwc3? Or, perhaps, after you unloaded
a gadget driver?

> Actually to fix this issue I updated condition of wait function
> from:
> !(dep->flags & DWC3_EP_END_TRANSFER_PENDING)
> to:
> !(dep->flags & DWC3_EP_END_TRANSFER_PENDING & DWC3_EP_ENABLED)

you're not fixing anything. You're, essentially, removing the entire
end transfer pending logic. The whole idea of this is that we can
disable the endpoint and wait for the End Transfer interrupt. When you
add a check for the endpoint being enabled, then that code will never
run and, thus, never wait for the End Transfer IRQ.

If you manage to find a more reliable way of reproducing this, then make
sure to capture dwc3 tracepoints (see the documentation for details) and
let's start trying to figure out what's going on.

cheers

--
balbi