Re: nmi watchdog failure on dual Athlon box

From: Joerg Sommrey
Date: Tue Sep 28 2004 - 13:32:18 EST


On Tue, Sep 28, 2004 at 06:08:37PM +0100, Maciej W. Rozycki wrote:
> On Tue, 28 Sep 2004, Joerg Sommrey wrote:
>
> > just tried Ingo's "lockupcli" nmi watchdog test - it fails to unlock the
> > box.
> >
> > boot-parm:
> > ...nmi_watchdog=2...
>
> The local APIC NMI watchdog has limited capabilities. It may fail to
> trigger for certain lockups because there is no available event that would
> happen periodically regardless of the CPU state. I can only suspect what
> "lockupcli" does (where is it available from, anyway?), but if it runs
> "cli; hlt", then the watchdog *will* fail.

Here's the quote from Ingo's mail:
In <2Jo20-7ry-33@xxxxxxxxxxxxxxxx> Ingo Molnar <mingo@xxxxxxx> writes:
|once the NMI watchdog is up and running it should catch all hard lockups
|and print backtraces to the serial console - even if you are within X
|while the lockup happens. You can test hard lockups by running the
|attached 'lockupcli' userspace code as root - it turns off interrupts
|and goes into an infinite loop => instant lockup. The NMI watchdog
|should notice this condition after a couple of seconds and should abort
|the task, printing a kernel trace as well. Your box should be back in
|working order after that point.

[...]

|--- lockupcli.c
|
|main ()
|{
| iopl(3);
| for (;;) asm("cli");
|}

Does this mean there is a good reason for further investigations on why
the IO-APIC NMI watchdog doesn't work? Until now I thought it would
be ok as long as the local APIC NMI watchdog is set up.

-jo

--
-rw-r--r-- 1 jo users 63 2004-09-28 18:42 /home/jo/.signature
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/