Re: Uhhuh. NMI received for unknown reason 2c on CPU 0.

From: Borislav Petkov
Date: Tue Mar 05 2013 - 19:19:39 EST


On Wed, Mar 06, 2013 at 01:13:23AM +0100, Rafael J. Wysocki wrote:
> I suspected that during resume from hibernation the boot kernel (the
> one that loaded the image) did something to hardware and the restored
> kernel didn't handle that change properly. It is hard do say what
> piece of hardware that was, however (it might or might not be the NIC,
> it may be pure coincidence that the NMI messages appear in the log at
> this point).

Agreed with the second part. About the first part, who communicates what
to whom, come to think of it, it might not be related to any devices at
all.

Here's why I think so:

So one of the things I did to trigger this is boot the machine, run
powertop and set all the knobs in the "Tunables" tab to "Good". One of
the tunables is turn-off-nmi-watchdog something which turns off the
watchdog which is using the perf infrastructure which generates NMIs
when the counter overflows.

Now, imagine I do that in the "normal" kernel, then suspend,
...<something happens or does not happen>, then resume back into the
normal kernel and it somehow "forgets" the fact that we disabled the NMI
watchdog before the suspend cycle. And boom, it gets a single spurious
NMI.

Does it make sense? I dunno - I'm just connecting the dots here between
the observation points which are most likely.

Anyway, it's getting late, good night. :)

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/