[RFC] [PATCH] Fix misrouted interrupts deadlocks

From: Pavel Emelianov
Date: Fri Nov 10 2006 - 09:00:58 EST


While testing kernel on machine with "irqpoll" option
I've caught such a lockup:

__do_IRQ()
spin_lock(&desc->lock);
desc->chip->ack(); /* IRQ is ACKed */
note_interrupt()
misrouted_irq()
handle_IRQ_event()
if (...)
local_irq_enable_in_hardirq();
/* interrupts are enabled from now */
...
__do_IRQ() /* same IRQ we've started from */
spin_lock(&desc->lock); /* LOCKUP */

Looking at misrouted_irq() code I've found that a potential
deadlock like this can also take place:

1CPU:
__do_IRQ()
spin_lock(&desc->lock); /* irq = A */
misrouted_irq()
for (i = 1; i < NR_IRQS; i++) {
spin_lock(&desc->lock); /* irq = B */
if (desc->status & IRQ_INPROGRESS) {

2CPU:
__do_IRQ()
spin_lock(&desc->lock); /* irq = B */
misrouted_irq()
for (i = 1; i < NR_IRQS; i++) {
spin_lock(&desc->lock); /* irq = A */
if (desc->status & IRQ_INPROGRESS) {

As the second lock on booth CPUs is taken before checking that
this irq is being handled in another processor this may cause
a deadlock. This issue is only theoretical.

I propose the attached patch to fix booth problems: when trying
to handle misrouted IRQ active desc->lock may be unlocked.

Please comment.
--- ./kernel/irq/spurious.c.irqlockup 2006-11-09 11:19:10.000000000 +0300
+++ ./kernel/irq/spurious.c 2006-11-10 16:53:38.000000000 +0300
@@ -147,7 +147,11 @@ void note_interrupt(unsigned int irq, st
if (unlikely(irqfixup)) {
/* Don't punish working computers */
if ((irqfixup == 2 && irq == 0) || action_ret == IRQ_NONE) {
- int ok = misrouted_irq(irq);
+ int ok;
+
+ spin_unlock(&desc->lock);
+ ok = misrouted_irq(irq);
+ spin_lock(&desc->lock);
if (action_ret == IRQ_NONE)
desc->irqs_unhandled -= ok;
}