Re: [PATCH] x86/MCE, EDAC/mce_amd: Save all aux registers on SMCA systems

From: Borislav Petkov
Date: Wed Apr 18 2018 - 13:14:12 EST


On Tue, Apr 17, 2018 at 06:30:34PM +0000, Ghannam, Yazen wrote:
> We could but it's an issue of documentation and testing the older systems.
>
> My first pass at this was to unconditionally read the registers because my
> understanding was that registers that aren't accessible would be read-as-zero.
> I thought this was a common MCA implementation. But Tony pointed out that
> this isn't the case on Intel systems. This is the case on recent AMD systems. But
> I don't know if it's the case on older systems which may or may not have
> followed the Intel implementation more closely.

So if our worry is the #GPs, we can always use the rdmsr*_safe()
variants and look at the return value. And dump a invalid value like
0xdeadbeef or so, if the read failed.

But if any bit of info we've gotten this way, helps us debug an MCE,
we're already golden!

> For example,
>
> Deferred error occurs:
> - MCA_{STATUS,ADDR,DESTAT,DEADDR} all have valid data.
>
> MCE occurs
> - MCA_{STATUS,ADDR} are overwritten with non-zero data.
> - MCE handler clears MCA_STATUS. MCA_ADDR is non-zero.
>
> DFR handler finds MCA_STATUS[Deferred] is clear, so it saves
> MCA_DESTAT and MCA_DEADDR which is 0.
>
> If !m->addr (which has MCA_DEADDR), then we read MCA_STATUS
> which has the address from the MCE.

The code could use a shorter version of this as a comment to state why
we're doing it. Because it is not obvious.

Thx.

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.