Re: [RFC 1/6] x86, NMI, Add symbol definition for NMI magic constants

From: Don Zickus
Date: Fri Sep 24 2010 - 10:29:46 EST


On Fri, Sep 24, 2010 at 07:50:16PM +0800, huang ying wrote:
> On Thu, Sep 23, 2010 at 10:16 PM, Don Zickus <dzickus@xxxxxxxxxx> wrote:
> >> On some system, there is some hardware error log in BMC/BIOS. The
> >> hardware error log can be gotten via IPMI or BIOS menu. Otherwise, can
> >> we get some useful info for unknown NMI? If we can, can we collect the
> >> info, then print it on console and save it into flash via ERST (part
> >> of APEI too) before panic?
> >
> > Ok.  Does the BIOS/BMC automatically do this?  Can we just print a message
> > on panic saying checking your BIOS/BMC logs for more info?
>
> Yes. BIOS/BMC automatically do that. And I will add it to panic message.
>
> > I would love to add code to gather more useful info for unknown NMIs, but
> > is it expected that HEST does some of this?  I guess what I am trying to
> > figure out, if we are going to put intelligence to detect a HEST enabled
> > machine and panic when unknown NMI comes along (presumably from HEST??),
> > then can we leverage HEST at all to understand why the NMI happened or
> > point the user to the BIOS/BMC to get more info.  In other words, what
> > value do we get HEST other than we detect its there, lets panic.
>
> Yes. HEST can be used to report some hardware error information. I am
> working on that now.
>
> >> HEST is defined in ACPI spec 4.0 and later version in section named
> >> APEI (ACPI Platform Error Interface). It is used to describe the error
> >> sources of system. It should be available only on server platform.
> >
> > Ok.  Does the kernel have intelligence to use it or the BIOS yet?
>
> HEST works in kernel BIOS cooperative way. I am working on a HEST
> driver which will get notified for NMI and collect the error
> information reported by BIOS. But It is possible that some systems
> have only BMC/BIOS log and do not report that to OS except unknown
> NMI. The unknown NMI panic logic is for these systems.

Ah ok, thanks for the info. I think adding the info to the panic message
would be valuable. I have no more objections to your patch now. :-)

I appreciate your patience for clue-ing me in on how HEST works!

Cheers,
Don
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/