Re: [RFC 5/6] x86, NMI, Add support to notify hardware error withunknown NMI

From: Andi Kleen
Date: Mon Sep 13 2010 - 14:07:19 EST



>
> Honestly, I don't think you need much screen real estate. It would be
> nice when an unknown NMI comes in, if the kernel just pokes around
> the hardware registers and display a summary of what it found. For
> example,
>
> The following devices had error bits set in the status registers:
> PCI device x:y.z - STATUS_BIT1 | STATUS_BIT2
> HW device xyz - STATUS_BIT3
> ...

You mean data from the generic PCI config space?

I don't think i would feel comfortable with arbitrary driver callbacks
(the risk of the driver breaking the panic would be high)

But if it's generic if not on the screen it should
be at least in the error serialization data and logged after boot.

At least on PCI-E it may be enough to simply dump all recent AER
data.

>
> But I guess if we accept the fact that an unknown NMI will panic the
> box, then we can probably be a little more liberal in breaking
> spinlocks and poking around the hardware to display some userful info.

You have to be a bit careful with that, you may caused nested errors
(e.g. machine checks or more NMIs). I suppose this could be checked for
though.

-Andi

--
ak@xxxxxxxxxxxxxxx -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/