Re: [PATCH -v3 5/6] x86, NMI, treat unknown NMI as hardware error

From: Andi Kleen
Date: Sun Oct 10 2010 - 10:13:16 EST


On Sun, Oct 10, 2010 at 03:07:13PM +0100, Alan Cox wrote:
> On Sat, 9 Oct 2010 14:49:46 +0800
> Huang Ying <ying.huang@xxxxxxxxx> wrote:
>
> > In general, unknown NMI is used by hardware and firmware to notify
> > fatal hardware errors to OS. So the Linux should treat unknown NMI as
> > hardware error and go panic upon unknown NMI for better error
> > containment.
>
> Not entirely true. Older machines use NMI for all sorts of interesting
> purposes. In particular many 486 laptops trigger NMI as part of power
> manaagement, (Hence the choice of the dazed and confused message)

In general, on any post stone age x86 system, ...

> > These systems are identified via the presentation of APEI HEST or
> > some PCI ID of the host bridge. The PCI ID of host bridge instead of
> > DMI ID is used, so that the checking can be done based on the platform
> > type instead of motherboard. This should be simpler and sufficient.
> >
> > The method to identify the platforms is designed by Andi Kleen.
>
> Why not make the new flag also a boot option so you can force it on for
> platforms where we don't auto whitelist it.

You can already set it at run time using sysctl.

echo 1 > /proc/sys/kernel/panic_on_unrecovered_nmi
echo 1 > /proc/sys/kernel/panic_on_io_nmi

-Andi

--
ak@xxxxxxxxxxxxxxx -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/