Re: [PATCH -v3 5/6] x86, NMI, treat unknown NMI as hardware error

From: Huang Ying
Date: Wed Oct 20 2010 - 02:12:44 EST


Hi, Don,

On Tue, 2010-10-12 at 05:20 +0800, Don Zickus wrote:
> > @@ -366,6 +368,15 @@ unknown_nmi_error(unsigned char reason,
> > if (notify_die(DIE_NMIUNKNOWN, "nmi", regs, reason, 2, SIGINT) ==
> > NOTIFY_STOP)
> > return;
> > + /*
> > + * On some platforms, hardware errors may be notified via
> > + * unknown NMI
> > + */
> > + if (unknown_nmi_as_hwerr)
> > + panic(
> > + "NMI for hardware error without error record: Not continuing\n"
> > + "Please check BIOS/BMC log for further information.");
> > +
> > #ifdef CONFIG_MCA
> > /*
> > * Might actually be able to figure out what the guilty party
>
> The only quirk I have left is the above piece, which is basically a
> philosophy difference with Robert and myself. Where we believe it should
> be on the die_chain and Andi and yourself would like to see it explicitly
> called out.

After some more thought, I found this is different from DIE_NMI and
DIE_NMI_IPI case. I think the code added is for general unknown NMI
processing instead of a device driver. What we do is not to add special
processing for some devices, but treat unknown NMI as hardware error
notification in general and use a white list to deal with broken
hardware and stone age machine. Do you agree?

If so, it should not be turned into a notifier block unless you want to
turn all general unknown NMI processing code into a notifier block.

Best Regards,
Huang Ying


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/