Re: [PATCH] [3/4] x86: MCE: Improve mce_get_rip

From: Andi Kleen
Date: Fri Apr 24 2009 - 04:47:16 EST


On Fri, Apr 24, 2009 at 04:28:56PM +0900, Hidetoshi Seto wrote:
> Huang Ying wrote:
> > On Fri, 2009-04-24 at 14:16 +0800, Hidetoshi Seto wrote:
> >> One question is: if (RIPV,EIPV) = (0,0), then is the IP on the stack
> >> really invalid value, or is it still point IP when MCE is generated?
> >> I suppose it is not invalid. If a processor encounters MCE and if it
> >> is not sure what happened, then it will store the IP on the stack,
> >> indicating neither of flags.
> >>
> >> If this supposition is correct, the best way is pick the value on
> >> the stack unconditionally, and record valid flags together.
> >
> > According to spec, the IP on stack can be not related to MCE if
> > (RIPV,EIPV) = (0,0). So it is meaningless to report them. If you report
> > them unconditionally, you just push the logic to user space or
> > administrator.
>
> Sorry, I could not find good page in the spec (Intel64 and IA-32 ASDM)...
> Could you point one?
>
> I believe that the IP with (RIPV,EIPV) = (1,0) is "not associated with the
> error" too, so is it meaningless to report the IP?

Historical background:

We used to not report RIP on EIPV=1 traditionally (back in 2004 or so
when I wrote that code). But because most x86s don't
set EIPVs and don't guarantee it's related the RIP was never reported.

But a few people asked for reporting it anyways even with EIPV=0 because e.g.
when you get a MCE on MMIO in a driver due to broken hardware the RIP tends to
be still nearby or at the MMIO access. So you can see roughly what went wrong.
It just warns about this by adding the !INEXACT! marker.

> If you think so then correct fix is replacing RIPV check by EIPV check.

Nope.

-Andi

--
ak@xxxxxxxxxxxxxxx -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/