Re: [PATCHv2] musb_host: fix lockup on rxcsr_h_error

From: Maxim Uvarov
Date: Wed Apr 27 2016 - 14:26:17 EST


2016-04-27 18:46 GMT+03:00 Bin Liu <b-liu@xxxxxx>:
> Hi,
>
> On Wed, Apr 27, 2016 at 09:51:58AM +0300, Max Uvarov wrote:
>> Fix soft lockup when resetting remote device attached
>> to usb host. Configuration:
>> pppd -> musb hub -> usb-serial -> gsm modem
>
> I have heard a few reports similar to this symptom, but never been able
> to reproduce it on my side.
>

Ok, I can reproduce it almost very easy.

>> When gsm modem resets, musb rolls in incoming rx interrupts
>> which does not give any time to other application as result
>> it totally lock ups. Solution is to keep original logic for RXCSR_H_ERROR
>
> Have you looked where exact place in the interrupt routine the execution
> has stuck in?
>

It does not stuck. It goes to that line which print proto error over
and over again and
nothing stops that. After some time kernel reports lockup. But
actually it's not stuck,
all cpu time was eaten by executing that handlers.


>> and merge RXCSR_DATAERROR and RXCSR_H_ERROR branches to call same code
>> for setting rx stall with MUSB_RXCSR_H_WZC_BITS.
>
> MUSB_RXCSR_H_WZC_BITS itself does not set rx stall, it just ensures
> MUSB_RXCSR_H_RXSTALL not to be cleared. Please check its comment in
> musb_regs.h.
>
>>
>> Signed-off-by: Max Uvarov <muvarov@xxxxxxxxx>
>> ---
>> v2: use bitwise or for error flags before logical and. (Sergei Shtylyov).
>>
>> drivers/usb/musb/musb_host.c | 12 +++++-------
>> 1 file changed, 5 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/usb/musb/musb_host.c b/drivers/usb/musb/musb_host.c
>> index c3d5fc9..2d9aa78 100644
>> --- a/drivers/usb/musb/musb_host.c
>> +++ b/drivers/usb/musb/musb_host.c
>> @@ -1592,14 +1592,12 @@ void musb_host_rx(struct musb *musb, u8 epnum)
>
> What kernel do you use? This line # is away off from upstream kernel.
>

I did this patch for 4.1 but 4.6 has the same problem and patch
cleanly applies to the
latest torvalds/linux.git v4.6-rc5. This interrupt handler has the
same code. And looks
like on 3.14 everything worked. I don't have a time to diff 2
versions. Might be regression.


>>
>> /* stall; record URB status */
>> status = -EPIPE;
>> + } else if (rx_csr & (MUSB_RXCSR_DATAERROR | MUSB_RXCSR_H_ERROR)) {
>>
>> - } else if (rx_csr & MUSB_RXCSR_H_ERROR) {
>> - dev_dbg(musb->controller, "end %d RX proto error\n", epnum);
>> -
>> - status = -EPROTO;
>> - musb_writeb(epio, MUSB_RXINTERVAL, 0);
>> -
>> - } else if (rx_csr & MUSB_RXCSR_DATAERROR) {
>> + if (rx_csr & MUSB_RXCSR_H_ERROR) {
>> + status = -EPROTO;
>> + musb_writeb(epio, MUSB_RXINTERVAL, 0);
>> + }
>
> Please help me to understand how this change fixes the issue. I see the
> most effect of the change here is directly 'goto finish' so that 'done'
> flag is not set, then musb_advance_schedule() is not called. Is this the
> case or I missed other important pieces?
>

Right that is the goal. On this rxcsr_h_error kernel reschedules
current interrupt.
And that continues forever. For example adding msleep() can give some
time for other
processes. I'm not an expert in this chip but I think that right
solution in that case is not
try to reschedule and quick and allow hub to make reset and once again
init all devices
(in my case ppp/pppd also shutdowns and then I bring everything up
with script.). The
same behavior with dma and pio mode.

Regards,
Max.

> Thanks,
> -Bin.
>
>>
>> if (USB_ENDPOINT_XFER_ISOC != qh->type) {
>> dev_dbg(musb->controller, "RX end %d NAK timeout\n", epnum);
>> --
>> 1.9.1
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-usb" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at http://vger.kernel.org/majordomo-info.html



--
Best regards,
Maxim Uvarov