RE: [tip:irq/core] genirq: Fix race on spurious interrupt detection

From: Thomas Gleixner
Date: Fri Oct 19 2018 - 14:41:41 EST


David,

On Fri, 19 Oct 2018, David Laight wrote:
> From: Lukas Wunner
> > Sent: 19 October 2018 16:34
> >
> > genirq: Fix race on spurious interrupt detection
> >
> > Commit 1e77d0a1ed74 ("genirq: Sanitize spurious interrupt detection of
> > threaded irqs") made detection of spurious interrupts work for threaded
> > handlers by:
> >
> > a) incrementing a counter every time the thread returns IRQ_HANDLED, and
> > b) checking whether that counter has increased every time the thread is
> > woken.
>
> That seems horribly broken.
> What is it trying to achieve?
>
> There are (at least) two common cases where IRQ_HANDLED doesn't get returned.
> (Unless the driver always returns it to avoid the message.)
>
> 1) The IOW that causes the hardware to drop a level sensitive IRQ is posted
> on the bus (etc) and happens late enough that the IRQ line is still
> asserted when the iret executes.
> If this happens all the time you need to flush the IOW, but if only
> occasionally it doesn't matter and you don't want a message.
>
> 2) Typically an ethernet driver ISR has to enable the interrupt and then
> check the ring for work before returning from the interrupt.
> If a packet arrives at this time it might be processed by the 'old'
> ISR invocation but still generate another interrupt.
> If no more packets arrive the second ISR invocation will find no work.
> Again this is normal behaviour.
> (Deferring everything with NAPI might make this not happen - but other
> interrupts end up working the same way.)
>
> If you are really trying to detect 'stuck' interrupts then you probably
> want to count un-handled ones and zero the count on handled ones.
> I'm also pretty sure you don't need an atomic counter.

Care to look at the logic which handles all of this including the
interaction with threaded interrupt handlers?

Thanks,

tglx