Re: nmi_watchdog=2 regression in 2.6.21

From: Stephane Eranian
Date: Thu Aug 30 2007 - 17:07:01 EST


Daniel,

On Wed, Aug 29, 2007 at 06:21:59PM -0700, Daniel Walker wrote:
> On Wed, 2007-08-29 at 14:24 -0700, Stephane Eranian wrote:
>
>
> > Now on Core Duo, there is no PEBS anyway, so it is okay to use counter 0
> > for NMI. The problem is that the detection code in perfctr-watchdog.c
> > treats a Core Duo and a Core 2 Duo the same way as they both have the
> > X86_FEATURE_ARCH_PERFMON bit set.
> >
> > I have attached a patch with handle the case of the Core Duo. Unfortunately,
> > I do not own one so I cannot test it. I would appreciate if you could
> > try re-applying my counter 0 -> 1 patch + this new one to see if you
> > have the problem with the NMI getting stuck.
>
> I tested your patch .. The system doesn't hang, but the NMI seems to
> disappear .. The check_nmi_watchdog() is not called, and the NMI never
> actually starts firing .. Is that what you had intended?
>
Yes, I realized I missed a small detail in the switch statement.
Could you try the new version?

Thanks.

--
-Stephane
diff --git a/arch/i386/kernel/cpu/perfctr-watchdog.c b/arch/i386/kernel/cpu/perfctr-watchdog.c
index 9b5d6af..3a945f0 100644
--- a/arch/i386/kernel/cpu/perfctr-watchdog.c
+++ b/arch/i386/kernel/cpu/perfctr-watchdog.c
@@ -613,6 +613,17 @@ static struct wd_ops intel_arch_wd_ops = {
.evntsel = MSR_ARCH_PERFMON_EVENTSEL1,
};

+/*
+ * Check for Intel Core Duo because it has a bug with PERFEVTSEL1
+ * (see Spefication Update bug AE49) and must use PERFEVTSEL0. We cannot
+ * use this counter on other processors supporting X86_FEATURE_ARCH_PERFMON
+ * because PEBS requires it.
+ */
+static inline int is_coreduo(void)
+{
+ return boot_cpu_data.x86 == 6 && boot_cpu_data.x86_model == 14;
+}
+
static void probe_nmi_watchdog(void)
{
switch (boot_cpu_data.x86_vendor) {
@@ -623,13 +634,14 @@ static void probe_nmi_watchdog(void)
wd_ops = &k7_wd_ops;
break;
case X86_VENDOR_INTEL:
- if (cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON)) {
+ if (cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON)
+ && !is_coreduo()) {
wd_ops = &intel_arch_wd_ops;
break;
}
switch (boot_cpu_data.x86) {
case 6:
- if (boot_cpu_data.x86_model > 0xd)
+ if (boot_cpu_data.x86_model > 0xe)
return;

wd_ops = &p6_wd_ops;