Re: NMI watchdog detected lockup

From: Andi Kleen
Date: Mon Oct 18 2004 - 13:02:07 EST


On Mon, 18 Oct 2004 10:13:11 -0700
"Randy.Dunlap" <rddunlap@xxxxxxxx> wrote:

> Marc Bevand wrote:
> > On 2004-10-17, Randy.Dunlap <rddunlap@xxxxxxxx> wrote:
> > |
> > | I'm seeing this often during a kernel build on AIC79xx.
> > | I did one kernel build on SATA without seeing this.
> > | This is on a dual-Opteron IBM Workstation A with
> > | 2 GB RAM, SATA, & SCSI.
> > | [...]
> > | NMI Watchdog detected LOCKUP on CPU0, registers:
> > | [...]
> >
> > You are not the first one to observe frequent watchdog timeout
> > lockup on dual Opteron systems during intense I/O operations,
> > see this thread:
> >
> > http://thread.gmane.org/gmane.linux.ide/1933
> >
> > Note: this does *not* seem to be SATA-related.
>
> Hi,
>
> Zwane suspected NMI spikes and advised me to disable nmi_watchdog
> (nmi_watchdog=0). After doing that, a kernel build completes
> successfully, although with many messages like these:
>
> Uhhuh. NMI received for unknown reason 21.

Something on your system creates bogus NMI interrupts. What chipset
are you using exactly?

Sometimes chipsets can be programmed to raise NMIs when an PCI bus
error occurs.

21 is the normal state (PIT timer running, but no errors logged)

If you have an AMD 8131 it could be in theory erratum 54, but then
normally one of the error bits in reason should be set.

-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/