Re: [PATCH] usb: dwc3: Clear DWC3_EVENT_PENDING when count is 0

From: Thinh Nguyen
Date: Tue Jan 10 2023 - 19:01:32 EST


On Tue, Jan 10, 2023, Linyu Yuan wrote:
>
> On 1/10/2023 11:13 AM, Linyu Yuan wrote:
> >
> > On 1/10/2023 11:05 AM, Linyu Yuan wrote:
> > >
> > > On 1/10/2023 10:53 AM, Thinh Nguyen wrote:
> > > > On Tue, Jan 10, 2023, Linyu Yuan wrote:
> > > > > On 1/10/2023 2:28 AM, Thinh Nguyen wrote:
> > > > > > On Fri, Jan 06, 2023, Linyu Yuan wrote:
> > > > > > > On 1/5/2023 5:54 PM, 정재훈 wrote:
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: Linyu Yuan [mailto:quic_linyyuan@xxxxxxxxxxx]
> > > > > > > > > Sent: Thursday, January 5, 2023 12:35 PM
> > > > > > > > > To: JaeHun Jung; Felipe Balbi; Greg Kroah-Hartman; Thinh Nguyen
> > > > > > > > > Cc: open list:USB XHCI DRIVER; open list;
> > > > > > > > > Seungchull Suh; Daehwan Jung
> > > > > > > > > Subject: Re: [PATCH] usb: dwc3: Clear
> > > > > > > > > DWC3_EVENT_PENDING when count is 0
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On 1/5/2023 11:29 AM, Linyu Yuan wrote:
> > > > > > > > > > On 1/2/2023 1:08 PM, JaeHun Jung wrote:
> > > > > > > > > > > Sometimes very rarely, The count is
> > > > > > > > > > > 0 and the DWC3 flag is set has
> > > > > > > > > > > status.
> > > > > > > > > > > It must not have these status.
> > > > > > > > > > > Because, It can make happen
> > > > > > > > > > > interrupt
> > > > > > > > > > > storming status.
> > > > > > > > > > could you help explain without clear the
> > > > > > > > > > flag, how interrupt storming
> > > > > > > > > > happen ?
> > > > > > > > > >
> > > > > > > > > > as your change didn't touch any hardware
> > > > > > > > > > register, i don't know how it
> > > > > > > > > > fix storming.
> > > > > > > > > >
> > > > > > > > H/W interrupts are still occur on IP.
> > > > > > > I guess we should fix it from IP layer.
> > > > > > >
> > > > > > How are you certain the problem is from IP layer?
> > > > > I think all IRQ is from DWC3 controller IP. if it is not IP
> > > > > layer, could you
> > > > > share how to prevent from SW layer ?
> > > > >
> > > > > seem IRQ can happen when event count is zero ,  why this can
> > > > > happen ? does
> > > > > it mean event count register is not trust ?
> > > > When the interrupt is unmasked, the controller will generate interrupts
> > > > as events are received. Normally, the flag checking for pending event
> > > > should be cleared before unmasking the interrupt, but we clear it after
> > > > to account for possible false interrupt due to the nature of legacy pci
> > > > interrupt. This exposes a gap where the interrupts can come but
> > > > the flag
> > > > isn't cleared. While it should be rare and shouldn't be too much of a
> > > > problem, we can avoid this scenario with some additional checks.
> > > >
> > > > > > > but when checking DWC3_EVENT_PENDING flag, it will
> > > > > > > auto clear in dwc3 thread
> > > > > > > irq handler.
> > > > > > >
> > > > > > > there is one possible root cause is it cleared only
> > > > > > > after irq enabled in
> > > > > > > dwc3_process_event_buf(),
> > > > > > >
> > > > > > > we should move unmask irq operation at end of this function.
> > > > > > >
> > > > > > This interrupt storm can happen because we clear the
> > > > > > evt->flags _after_
> > > > > > we unmask the interrupt. This was done to prevent false
> > > > > > interrupt from
> > > > > > delay interrupt deassertion, which can be a problem for legacy pci
> > > > > > interrupt.
> > > > > thanks for explain, i didn't know that.
> > > > > > see 7441b273388b ("usb: dwc3: gadget: Fix event pending check")
> > > > > >
> > > > > > The change JaeHun Jung did should be fine.
> > > > > agree.
> > > > The change may still need some additional check as suggested in my
> > > > response:
> > > > https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/20230109190914.3blihjfjdcszazdd@xxxxxxxxxxxx/T/*m7b907aa6da4023cb20fa00a57813d31fd84e081f__;Iw!!A4F2R9G_pg!dchc0UAeZnm7PcA-aav_58rsw7agMygwPSGC_kN8Ga_BS3oTbLMNh3DkfQ9B2njbva-tJzTj7Cb2dJnE5RbEW6KJnDmtFA$
> > > >
> > >  do you think we need to read event count before checking
> > > DWC3_EVENT_PENDING  ?
> > sorry for this noise, may be i have a little understanding of the legcy
> > pci issue now.
> one more question, is it legacy PCIe device still exist in real world ? and
> any VID/PID info ?

Currently, all dwc3 PCIe devices are affected. Some setups are more
noticeable than others. The dwc3 driver is implemented to probe platform
devices. So, dwc3 PCIe devices are wrapped as platform devices for the
dwc3 driver. Since we're going through the platform device code path,
the pci layer falls back to using legacy interrupt instead of MSI (last
I check awhile ago).

A little more detail on this problem:
PCIe legacy interrupt will emulate interrupt line by sending an
interrupt assert and deassert messages. After the interrupt assert
message is sent, interrupts are continuously generated until the
deassert message is sent. If there's a register write to unmask/mask
interrupt or clearing events falls in between these messages, then there
may be a race.

Let's say we don't have event pending check, this can happen:

Normal scenario
---------------
event_count += n # controller generates new events
interrupt asserts
write(mask irq)
event_count -= n # dwc3 clears events
interrupt deasserts
write(unmask irq)


Race scenario
-------------
event_count += n # new events
interrupt asserts
write(mask irq)
event_count -= n # clear events
event_count += n # more events come and hard irq handler gets called
# again as interrupt is generated, but cached
# events haven't been handled. This breaks
# serialization and causes lost events.
write(mask irq)

event_count -= n # clear events
interrupt deasserts
write(unmask irq) # events handled


For MSI, this won't be a problem because it's edge-triggered and the way
it sends interrupt is different.

BR,
Thinh