Re: [PATCH v3 09/11] i2c: npcm: Handle spurious interrupts

From: Avi Fishman
Date: Sun Apr 10 2022 - 03:33:36 EST


On Tue, Apr 5, 2022 at 10:13 AM Andy Shevchenko
<andriy.shevchenko@xxxxxxxxxxxxxxx> wrote:
>
> On Mon, Apr 04, 2022 at 08:03:44PM +0300, Avi Fishman wrote:
> > On Thu, Mar 3, 2022 at 4:14 PM Andy Shevchenko
> > <andriy.shevchenko@xxxxxxxxxxxxxxx> wrote:
> > > On Thu, Mar 03, 2022 at 02:48:20PM +0200, Tali Perry wrote:
> > > > > On Thu, Mar 3, 2022 at 12:37 PM Andy Shevchenko <andriy.shevchenko@xxxxxxxxxxxxxxx> wrote:
> > > > > > On Thu, Mar 03, 2022 at 04:31:39PM +0800, Tyrone Ting wrote:
> > > > > > > From: Tali Perry <tali.perry1@xxxxxxxxx>
> > > > > > >
> > > > > > > In order to better handle spurious interrupts:
> > > > > > > 1. Disable incoming interrupts in master only mode.
> > > > > > > 2. Clear end of busy (EOB) after every interrupt.
> > > > > > > 3. Return correct status during interrupt.
> > > > > >
> > > > > > This is bad commit message, it doesn't explain "why" you are doing these.
> > >
> > > ...
> > >
> > > > BMC users connect a huge tree of i2c devices and muxes.
> > > > This tree suffers from spikes, noise and double clocks.
> > > > All these may cause spurious interrupts to the BMC.
>
> (1)
>
> > > > If the driver gets an IRQ which was not expected and was not handled
> > > > by the IRQ handler,
> > > > there is nothing left to do but to clear the interrupt and move on.
> > >
> > > Yes, the problem is what "move on" means in your case.
> > > If you get a spurious interrupts there are possibilities what's wrong:
> > > 1) HW bug(s)
> > > 2) FW bug(s)
> > > 3) Missed IRQ mask in the driver
> > > 4) Improper IRQ mask in the driver
> > >
> > > The below approach seems incorrect to me.
> >
> > Andy, What about this explanation:
> > On rare cases the i2c gets a spurious interrupt which means that we
> > enter an interrupt but in
> > the interrupt handler we don't find any status bit that points to the
> > reason we got this interrupt.
> > This may be a rare case of HW issue that is still under investigation

About 1 to 100,000 transactions

> > In order to overcome this we are doing the following:
> > 1. Disable incoming interrupts in master mode only when slave mode is
> > not enabled.
> > 2. Clear end of busy (EOB) after every interrupt.
> > 3. Clear other status bits (just in case since we found them cleared)
> > 4. Return correct status during the interrupt that will finish the transaction.
> > On next xmit transaction if the bus is still busy the master will
> > issue a recovery process before issuing the new transaction.
>
> This sounds better, thanks.
>
> One thing to clarify, the (1) states that the HW "issue" is known and becomes a
> PCB level one, i.e. noisy environment that has not been properly shielded.
> So, if it is known, please put the reason in the commit message.
>

The HW issue is not known yet, we see it on few platforms and in other
platforms we don't, so the first assumption was this.
So eventually we don't want to claim this without proving it.

> Also would be good to see numbers of "rare". Is it 0.1%?

I added above the known statistics.

>
> > > > If the transaction failed, driver has a recovery function.
> > > > After that, user may retry to send the message.
> > > >
> > > > Indeed the commit message doesn't explain all this.
> > > > We will fix and add to the next patchset.
> > > >
> > > > > > > + /*
> > > > > > > + * if irq is not one of the above, make sure EOB is disabled and all
> > > > > > > + * status bits are cleared.
> > > > > >
> > > > > > This does not explain why you hide the spurious interrupt.
> > > > > >
> > > > > > > + */
>
> --
> With Best Regards,
> Andy Shevchenko
>
>


--
Regards,
Avi