Re: [2.6.17-rc5-mm2] crash when doing second suspend: BUG in arch/i386/kernel/nmi.c:174

From: Don Zickus
Date: Tue Jun 06 2006 - 18:59:30 EST


On Tue, Jun 06, 2006 at 03:15:07PM -0700, Andrew Morton wrote:
> On Tue, 6 Jun 2006 17:45:53 -0400
> Don Zickus <dzickus@xxxxxxxxxx> wrote:
>
> > On Tue, Jun 06, 2006 at 04:18:15PM +0200, Andi Kleen wrote:
> > >
> > > > Because he is using a i386 machine, the nmi watchdog is disabled by
> > > > default.
> > >
> > > I changed that - it's now on by default on i386 too.
> > >
> > > -Andi
> >
> > I am trying to create a patch for this problem and it just dawned on me,
> > how does one store the previous state in a suspend/resume path if the code
> > hotplugs all the cpus first? CPU0 is easy because an explicit
> > suspend/resume path is called, but it seems to be called last after all
> > the other cpus have been removed. How do I save the state?
>
> I'm really struggling to understand this question. If you're referring to
> some per-cpu state then a CPU hotplug handler would be appropriate?

Sorry. I got ahead of myself. My concern is how the suspend/resume code
works with device drivers on an SMP system. My initial impression was
that the subsystem registers with the suspend/resume layer and upon such
actions those registered functions are called.

Inside those functions I saved the previous state of the watchdog timer.
However, I learned today that my understanding was incorrect. Instead
first the _hotplug_ code is called for every cpu _except_ cpu0. The
_suspend/resume_ functions are only called in the context of _cpu0_.

This breaks the design I have because upon resuming the watchdog timers
automatically start on all cpus (except cpu0 because I saved the previous
state through the handlers), regardless of what the previous state was.

So my question is/was what is the proper way to handle processor level
subsystems during the suspend/resume path on an SMP system. I really
don't understand the hotplug path nor the suspend/resume path very well.

I didn't want to register a hotplug handler because a hotplug event is
really different than a suspend event (I want to _save_ info during a
suspend event). The documentation I was reading seemed to suggest that
hotplug/suspend/smp was a work-in-progress.

Is the typical approach to just hack in an extra parameter to the
start/stop functions of the nmi_watchdog letting the function know it is
coming through the suspend/resume path?

Any tips, code, other docs would be helpful.

Cheers,
Don
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/