Re: [RFC 5/6] x86, NMI, Add support to notify hardware error withunknown NMI

From: Don Zickus
Date: Mon Sep 13 2010 - 10:11:59 EST


On Mon, Sep 13, 2010 at 10:19:49AM +0800, Huang Ying wrote:
> On Sat, 2010-09-11 at 02:40 +0800, Don Zickus wrote:
> > On Fri, Sep 10, 2010 at 06:19:29PM +0200, Andi Kleen wrote:
> > >
> > > > I am grasping for straws here, but is there a register that APEI/HEST
> > > > can poke to see if it generated the NMI?
> > >
> > > HEST knows this yes.
> > >
> > > But this is not about HEST errors, but about those without HEST
> > > handling.
> >
> > Don't most unknown NMIs fall into the same boat, that they were not being
> > handled properly?
>
> As far as I know, at least on some platforms, unknown NMIs are used for
> hardware error reporting. They will cause "Blue Screen" in Windows.

Unfortunately, most of the bugzillas I deal with, unkown NMIs are the
result of SERRs. While you can consider that hardware error reporting,
the easiest way for me to debug those problems currently is to have
reporters run 'lspci -vvv' after the NMI is displayed to figure out who
caused the NMI.

My fear is that panic'ing the box on unknown NMIs on those platforms will
hinder my ability to easily debug those NMIs.

>
> > On the other hand could you use the die_notifier_chain(DIE_UNKNOWNNMI) for
> > the same purpose and keep the unknown_nmi_error() handler a little
> > cleaner?
>
> I think explicit function call has better readability than notifier
> chain.

Ok. What criteria should we establish to determine which functions go on
the notifier chain and which ones can explicitly called?

Cheers,
Don
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/