RE: [RFC][PATCH 0/2] Avoiding DWC3 transfer stalls/hangs when using adb over f_fs

From: Yang, Fei
Date: Thu Jan 23 2020 - 13:29:04 EST


>>>>>> Since ~4.20, when the functionfs gadget enabled scatter-gather
>>>>>> support, we have seen problems with adb connections stalling and
>>>>>> stopping to function on hardware with dwc3 usb controllers.
>>>>>> Specifically, HiKey960, Dragonboard 845c, and Pixel3 devices.
>>>>>
>>>>> Any chance this:
>>>>>
>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb.git/commi
>>>>> t/
>>>>> ?h=testing/next&id=f63333e8e4fd63d8d8ae83b89d2c38cf21d64801
>>>> This is a different issue. I have tried initializing num_sgs when debugging this adb stall problem, but it didn't help.
>>>
>>> So multiple folks have run through this problem, but not *one* has tracepoints collected from the issue? C'mon guys.
>>> Can someone, please, collect tracepoints so we can figure out what's actually going on?
>>>
>>> I'm pretty sure this should be solved at the DMA API level, just want to confirm.
>>
>> I have sent you the tracepoints long time ago. Also my analysis of the
>> problem (BTW, I don't think the tracepoints helped much). It's
>> basically a logic problem in function dwc3_gadget_ep_reclaim_trb_sg().
>
> AFAICT, this is caused by DMA API merging pages together when map an sglist for DMA. While doing that,
> it does *not* move the SG_END flag which sg_is_last() checks.
>
> I consider that an overlook on the DMA API, wouldn't you? Why should DMA API users care if pages were merged or not while mapping the sglist?
> We have for_each_sg() and sg_is_last() for a reason.

Oops, my bad. Actually, I was talking about the other patch, not the one setting num_sgs = 0; I don't know if this patch is really needed, but from
what I remember the DMA API is setting up the num_sgs properly. I agree even if there is a problem initializing num_sgs, it should be fixed in DMA API.

> I can try dig into my old emails and resend, but that is a bit hard to find.
>
> Don't bother, I'm still not convinced we should fix at the driver level when sg_is_last() should be working here,
> unless we should iterate over num_sgs instead of num_mapped_sgs, though I don't think that's the case since
> in that case we would have to chain buffers of size zero.

> --
> balbi