Re: Catching NForce2 lockup with NMI watchdog

From: cheuche+lkml
Date: Fri Dec 05 2003 - 15:19:05 EST


On Fri, Dec 05, 2003 at 11:11:39AM -0800, Allen Martin wrote:
>
> Likely the root of the problem has to do with the way the Linux kernel is
> using the ACPI methods to setup the interrupts which is different from win
> 9x/2k/XP. I can help track this down, unfortunately so far I've been unable
> to reproduce the hangs on any of the boards I have.
>
With a little patch in arch/i386/kernel/mpparse.c in the acpi section, I
managed to get the timer interrupt back on IO-APIC-edge, maybe the nmi
watchdog could work with the ioapic then ?

With the patch, the interrupt flood on IRQ7 I reported on the nvidia2
lockups thread also disappeared, but then I noticed something odd when
there is ide activity :
With amd74xx/nforce driver, I can almost instantly hang the machine
(nothing new there), but with the generic ide driver and the IO load a
cat /dev/hda > /dev/null can do, timer interrupts don't seem to get
through easily. I first thought the box freezed but I realized the
software cursor was blinking *very* slowly. In fact 1 second for the
kernel took about 12 seconds. Stopping the IO load on ide and
everything seems back to normal.

There may be something wrong with the timer using apic and the
amd/nforce ide driver does not handle this situation that should not
occur and juste freezes. This is pure speculation of course.

I looked in mpparse.c because this is where I noticed the difference
about the timer interrupt setup with apic between 2.4.22 and 2.4.23.
However it is in the path of ACPI source interrupt override, maybe the
modification I made just overrides the override (sigh).

*Disclaimer*
The modification is certainly not the proper fix, does a wrong thing,
but it shows an interesting behavior, especially it fixed the
interrupt flood on IRQ7 I and some others are able to see.

Here the little patch of arch/i386/kernel/mpparse.c I used :

--- mpparse.c.old 2003-12-05 14:42:10.000000000 +0100
+++ mpparse.c 2003-12-05 14:43:41.000000000 +0100
@@ -962,7 +962,8 @@
*/
for (i = 0; i < mp_irq_entries; i++) {
if ((mp_irqs[i].mpc_dstapic == intsrc.mpc_dstapic)
- && (mp_irqs[i].mpc_srcbusirq == intsrc.mpc_srcbusirq)) {
+ && (mp_irqs[i].mpc_srcbusirq == intsrc.mpc_srcbusirq)
+ && (mp_irqs[i].mpc_irqtype == intsrc.mpc_irqtype)) {
mp_irqs[i] = intsrc;
found = 1;
break;



I hope this helps,

Mathieu
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/