Re: [2.6.17-rc5-mm2] crash when doing second suspend: BUG in arch/i386/kernel/nmi.c:174

From: Don Zickus
Date: Tue Jun 06 2006 - 22:44:36 EST


Makes the start/stop paths of nmi watchdog more robust to handle the
suspend/resume cases more gracefully.

Signed-off-by: Don Zickus <dzickus@xxxxxxxxxx>

---

On Tue, Jun 06, 2006 at 09:23:59AM -0700, Jeremy Fitzhardinge wrote:
> Shaohua Li wrote:
> >Does below patch help? The nmi suspend/resume doesn't look good to me.
> >Only CPU0 uses the suspend/resume code path. Other CPUs run the CPU
> >hotplug code path.
> >
> Unfortunately this just oopses immediately on the first suspend. The
> stack trace is unclear (and I'm just going from memory at the moment),
> but it looked like it got an invalid op. I'll try to get a clearer idea
> of the crash later today.
>
> J

Can you apply this patch on top of Shaohua's. This should fix all your
suspend problems.

Inside the patch is a little hack to handle the scenario when we come out
of resume we do _not_ want the nmi watchdog enabled (to match the
case entering suspend).

Compiled but not tested, as I don't have easy access to my test machines
right now. Mainly posted for Andrew to pick up for rc6-mm1.

Cheers,
Don

Index: linux-don/arch/i386/kernel/nmi.c
===================================================================
--- linux-don.orig/arch/i386/kernel/nmi.c
+++ linux-don/arch/i386/kernel/nmi.c
@@ -745,6 +745,7 @@ static void stop_intel_arch_watchdog(voi

void setup_apic_nmi_watchdog (void *unused)
{
+ struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
#ifdef CONFIG_LOCKDEP
/*
* The NMI watchdog uses spinlocks (notifier chains, etc.),
@@ -761,6 +762,14 @@ void setup_apic_nmi_watchdog (void *unus
(nmi_watchdog != NMI_IO_APIC))
return;

+ if (wd->enabled == 1)
+ return;
+
+ /* cheap hack to support suspend/resume */
+ /* if cpu0 is not active neither should the other cpus */
+ if ((smp_processor_id() != 0) && (atomic_read(&nmi_active) <= 0))
+ return;
+
if (nmi_watchdog == NMI_LOCAL_APIC) {
switch (boot_cpu_data.x86_vendor) {
case X86_VENDOR_AMD:
@@ -798,17 +807,22 @@ void setup_apic_nmi_watchdog (void *unus
return;
}
}
- __get_cpu_var(nmi_watchdog_ctlblk.enabled) = 1;
+ wd->enabled = 1;
atomic_inc(&nmi_active);
}

void stop_apic_nmi_watchdog(void *unused)
{
+ struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
+
/* only support LOCAL and IO APICs for now */
if ((nmi_watchdog != NMI_LOCAL_APIC) &&
(nmi_watchdog != NMI_IO_APIC))
return;

+ if (wd->enabled == 0)
+ return;
+
if (nmi_watchdog == NMI_LOCAL_APIC) {
switch (boot_cpu_data.x86_vendor) {
case X86_VENDOR_AMD:
@@ -836,7 +850,7 @@ void stop_apic_nmi_watchdog(void *unused
return;
}
}
- __get_cpu_var(nmi_watchdog_ctlblk.enabled) = 0;
+ wd->enabled = 0;
atomic_dec(&nmi_active);
}

Index: linux-don/arch/x86_64/kernel/nmi.c
===================================================================
--- linux-don.orig/arch/x86_64/kernel/nmi.c
+++ linux-don/arch/x86_64/kernel/nmi.c
@@ -672,6 +672,7 @@ static void stop_intel_arch_watchdog(voi

void setup_apic_nmi_watchdog(void *unused)
{
+ struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
#ifdef CONFIG_LOCKDEP
/*
* The NMI watchdog uses spinlocks (notifier chains, etc.),
@@ -688,6 +689,14 @@ void setup_apic_nmi_watchdog(void *unuse
(nmi_watchdog != NMI_IO_APIC))
return;

+ if (wd->enabled == 1)
+ return;
+
+ /* cheap hack to support suspend/resume */
+ /* if cpu0 is not active neither should the other cpus */
+ if ((smp_processor_id() != 0) && (atomic_read(&nmi_active) <= 0))
+ return;
+
if (nmi_watchdog == NMI_LOCAL_APIC) {
switch (boot_cpu_data.x86_vendor) {
case X86_VENDOR_AMD:
@@ -709,17 +718,22 @@ void setup_apic_nmi_watchdog(void *unuse
return;
}
}
- __get_cpu_var(nmi_watchdog_ctlblk.enabled) = 1;
+ wd->enabled = 1;
atomic_inc(&nmi_active);
}

void stop_apic_nmi_watchdog(void *unused)
{
+ struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
+
/* only support LOCAL and IO APICs for now */
if ((nmi_watchdog != NMI_LOCAL_APIC) &&
(nmi_watchdog != NMI_IO_APIC))
return;

+ if (wd->enabled == 0)
+ return;
+
if (nmi_watchdog == NMI_LOCAL_APIC) {
switch (boot_cpu_data.x86_vendor) {
case X86_VENDOR_AMD:
@@ -738,7 +752,7 @@ void stop_apic_nmi_watchdog(void *unused
return;
}
}
- __get_cpu_var(nmi_watchdog_ctlblk.enabled) = 0;
+ wd->enabled = 0;
atomic_dec(&nmi_active);
}

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/