Re: troubleshooting/debugging hard locks

From: Zan Lynx
Date: Wed May 14 2008 - 19:43:20 EST


On Wed, 2008-05-14 at 15:43 -0700, Ray Lee wrote:
> On Wed, May 14, 2008 at 12:27 PM, Lee Howard <faxguy@xxxxxxxxxxxxxxxx> wrote:

> > But, without kernel messages indicating where to look to debug... what is
> > the best approach to start troubleshooting and debugging this condition? Is
> > there some general debug feature that can be enabled in the kernel that
> > would help hone in on the culprit?
>
> There's something called the NMI watchdog, that will print debugging
> messages out if it finds the system has hard locked. The short version
> is that you should add "nmi_watchdog=1" (no quotes) to the line in
> GRUB that has the kernel options. That assumes you have an APIC on the
> system. If that's not the case (you're on Uniprocessor, and no APIC)
> then you can try nmi_watchdog=2 instead. That'll only work on some
> systems, though.
>
> Better docs (than my cheesy writeup) are in
> Documentation/nmi_watchdog.txt in the kernel source distribution.

I was once told to add these to the kernel command line as well when
using NMI watchdog and they do seem to help it trigger more reliably:

"idle=poll nohz=off"

--
Zan Lynx <zlynx@xxxxxxx>

Attachment: signature.asc
Description: This is a digitally signed message part