Re: [PATCH 3/3 v2] nmi_watchdog: config option to enable newnmi_watchdog

From: Don Zickus
Date: Mon Feb 08 2010 - 09:58:49 EST


On Mon, Feb 08, 2010 at 08:19:54AM +0100, Ingo Molnar wrote:
>
> * Don Zickus <dzickus@xxxxxxxxxx> wrote:
>
> > +config NMI_WATCHDOG
> > + bool "Detect Hard Lockups with an NMI Watchdog"
> > + depends on DEBUG_KERNEL && PERF_EVENTS
> > + default y
> > + help
> > + Say Y here to enable the kernel to use the NMI as a watchdog
> > + to detect hard lockups. This is useful when a cpu hangs for no
> > + reason but can still respond to NMIs. A backtrace is displayed
> > + for reviewing and reporting.
> > +
> > + The overhead should be minimal, just an extra NMI every few
> > + seconds.
>
> Thought for later patches: I think an architecture should be able to express
> via a Kconfig switch that it actually _has_ NMI events. There's architectures
> which dont have a PMU driver and only have software events. There's also
> architectures that have a PMU driver but no NMIs.
>
> Something like ARCH_HAS_NMI_PERF_EVENTS?

I guess I assumed the perf event subsystem would take care of that which
is why I made the config option dependent on PERF_EVENTS. I am open to
suggestions on enhance it.

>
> Also, i havent checked, but what is the practical effect of the new generic
> watchdog on x86 CPUs that does not have a native PMU driver yet - such as
> P4s?

I believe the call to perf_event_create_kernel_counter would fail, which
then prevents the cpu from coming online. Probably not the smartest thing
to do. I was looking at adding code to fall back to trying PERF_TYPE_SOFTWARE.
Let me dig up a P4 box and see what happens.

>
> Anyway, i'll create a tip:perf/nmi topic branch for these patches, it
> certainly looks like a useful generalization and a new architecture that has
> perf could easily enable it, without having to write its own NMI watchdog
> implementation. It's also useful for any new watchdog features that people
> might want to add. Plus it makes the x86 PMU code cleaner in the long run as
> well.

Agreed.

Cheers,
Don
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/