Re: 2.6.19-rc6: known regressions (v4)

From: Linus Torvalds
Date: Tue Nov 21 2006 - 17:23:24 EST




On Tue, 21 Nov 2006, Vivek Goyal wrote:

> On Tue, Nov 21, 2006 at 10:24:24PM +0100, Adrian Bunk wrote:
> > This email lists some known regressions in 2.6.19-rc6 compared to 2.6.18
> > that are not yet fixed in Linus' tree.
> >
> > If you find your name in the Cc header, you are either submitter of one
> > of the bugs, maintainer of an affectected subsystem or driver, a patch
> > of you caused a breakage or I'm considering you in any other way possibly
> > involved with one or more of these issues.
> >
> > Due to the huge amount of recipients, please trim the Cc when answering.
> >
> >
> > Subject : kernel hangs when booting with irqpoll
> > References : http://lkml.org/lkml/2006/11/20/233
> > Submitter : Vivek Goyal <vgoyal@xxxxxxxxxx>
> > Caused-By : Pavel Emelianov <xemul@xxxxxxxxxx>
> > commit f72fa707604c015a6625e80f269506032d5430dc
> > Handled-By : Vivek Goyal <vgoyal@xxxxxxxxxx>
> > Status : problem is being debugged
> >
>
> Adrian,
>
> Pavel already provided a fix for this issue.
>
> http://marc.theaimsgroup.com/?l=linux-kernel&m=116409933100117&w=2

I really think this is wrong.

The original patch was wrong, and the _real_ problem is in __do_IRQ() that
got the desc->lock too early.

I _think_ the correct fix is to simply revert the broken commit, and fix
the _one_ place that called "misnote_interrupt()" with the lock held.

Something like this..

I also think that the real fix will be to move the whole

if (!noirqdebug)
note_interrupt(irq, desc, action_ret);


into handle_IRQ_event itself, since every caller (except for
"misrouted_irq()" itself, and that should probably be done separately)
should always do it. Right now we have a lot of people that just do

action_ret = handle_IRQ_event(irq, action);
if (!noirqdebug)
note_interrupt(irq, desc, action_ret);

explicitly.

The only thing that keeps us from doing that is that we don't pass in
"desc", but we should just do that.

But in the meantime, this appears to be the minimal fix. Can people please
test and verify?

Linus

---
diff --git a/kernel/irq/handle.c b/kernel/irq/handle.c
index 42aa6f1..a681912 100644
--- a/kernel/irq/handle.c
+++ b/kernel/irq/handle.c
@@ -231,10 +231,10 @@ fastcall unsigned int __do_IRQ(unsigned
spin_unlock(&desc->lock);

action_ret = handle_IRQ_event(irq, action);
-
- spin_lock(&desc->lock);
if (!noirqdebug)
note_interrupt(irq, desc, action_ret);
+
+ spin_lock(&desc->lock);
if (likely(!(desc->status & IRQ_PENDING)))
break;
desc->status &= ~IRQ_PENDING;
diff --git a/kernel/irq/spurious.c b/kernel/irq/spurious.c
index 9c7e2e4..543ea2e 100644
--- a/kernel/irq/spurious.c
+++ b/kernel/irq/spurious.c
@@ -147,11 +147,7 @@ void note_interrupt(unsigned int irq, st
if (unlikely(irqfixup)) {
/* Don't punish working computers */
if ((irqfixup == 2 && irq == 0) || action_ret == IRQ_NONE) {
- int ok;
-
- spin_unlock(&desc->lock);
- ok = misrouted_irq(irq);
- spin_lock(&desc->lock);
+ int ok = misrouted_irq(irq);
if (action_ret == IRQ_NONE)
desc->irqs_unhandled -= ok;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/