Re: [PATCH v5 2/2] x86/mce: Add support for Extended Physical Address MCA changes

From: Luck, Tony
Date: Fri Apr 15 2022 - 12:38:09 EST


On Fri, Apr 15, 2022 at 02:56:54PM +0000, Yazen Ghannam wrote:
> 3) OS, or optionally BIOS, polls MCA banks and logs any valid errors.
> a) Since MCi_CTL, etc. are cleared due to reset, any errors detected are
> from before the reset.

On Intel not quite any error. H/w can still log to a bank but MCi_STATUS.EN bit
will be zero. We've also had some BIOS code that did things that logged errors
and then left them for the OS to find during boot.

But this sequence does give more confidence that errors found in banks duing
boot are "old".

> I agree. The Intel SDM and AMD APM have the following procedure, in summary.
>
> 1) Set MCG_CTL
> 2) Set MCi_CTL for all banks
> 3) Read MCi_STATUS and log valid errors.
> 4) Clear MCi_STATUS
> 5) Set CR4.MCE

Yes. That's what the pseudo-code in Intel SDM Example 15-1 says :-(
>
> I don't know of a reason why STATUS needs to be cleared after MCi_CTL is set.
> The only thing I can think of is that enabling MCi_CTL may cause spurious info
> logged in MCi_STATUS, and that needs to be cleared out. I'm asking AMD folks
> about it.
>
> Of course, this contradicts the flow I outlined above, and also the flow given
> in the AMD Processor Programming Reference (PPR). I wonder if the
> architectural documents have gotten stale compared to current guidelines. I'm
> asking about this too.

I will ask architects about this sequence too.

-Tony