Re: NMI watchdog detected lockup

From: Randy.Dunlap
Date: Mon Oct 18 2004 - 13:14:32 EST


Andi Kleen wrote:
On Mon, 18 Oct 2004 10:13:11 -0700
"Randy.Dunlap" <rddunlap@xxxxxxxx> wrote:


Marc Bevand wrote:

On 2004-10-17, Randy.Dunlap <rddunlap@xxxxxxxx> wrote:
| | I'm seeing this often during a kernel build on AIC79xx.
| I did one kernel build on SATA without seeing this.
| This is on a dual-Opteron IBM Workstation A with
| 2 GB RAM, SATA, & SCSI.
| [...]
| NMI Watchdog detected LOCKUP on CPU0, registers:
| [...]

You are not the first one to observe frequent watchdog timeout
lockup on dual Opteron systems during intense I/O operations,
see this thread:

http://thread.gmane.org/gmane.linux.ide/1933

Note: this does *not* seem to be SATA-related.

Hi,

Zwane suspected NMI spikes and advised me to disable nmi_watchdog
(nmi_watchdog=0). After doing that, a kernel build completes
successfully, although with many messages like these:

Uhhuh. NMI received for unknown reason 21.


Something on your system creates bogus NMI interrupts. What chipset
are you using exactly?

Sometimes chipsets can be programmed to raise NMIs when an PCI bus
error occurs.

21 is the normal state (PIT timer running, but no errors logged)

If you have an AMD 8131 it could be in theory erratum 54, but then
normally one of the error bits in reason should be set.

Yes, it's an AMD-8111 / 8131 / 8151 / K8-northbridge machine.

--
~Randy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/