RE: [PATCH v2 1/2] x86/mce/AMD: Redo use of SMCA MCA_DE{STAT,ADDR} registers
From: Ghannam, Yazen
Date: Tue Apr 11 2017 - 08:54:21 EST
> -----Original Message-----
> From: Borislav Petkov [mailto:bp@xxxxxxxxx]
> Sent: Friday, April 07, 2017 5:35 PM
> To: Ghannam, Yazen <Yazen.Ghannam@xxxxxxx>
> Cc: linux-edac@xxxxxxxxxxxxxxx; Tony Luck <tony.luck@xxxxxxxxx>;
> x86@xxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
> Subject: Re: [PATCH v2 1/2] x86/mce/AMD: Redo use of SMCA
> MCA_DE{STAT,ADDR} registers
>
> On Fri, Apr 07, 2017 at 08:37:03PM +0000, Ghannam, Yazen wrote:
> > CEs will get picked up by polling or if we hit a threshold.
>
> I think we should be pre-emptive here and simply log the error. No use
> waiting until something polling finally gets its hands on it - we're looking at the
> signature so we might just as well log it.
>
Okay, will do.
> > Okay. But do we need a return value? I'm thinking we can go through
> > all banks and log any and all Deferred errors rather than just the
> > first one we find. I asked and this is the preferred method like how
> > we do in the #MC handler. The same applies to the thresholding interrupt
> handler.
>
> ... and not only the deferred errors but the CEs too.
>
> Which would make the whole code a *lot* simpler. You simply iterate over
> banks:
>
> for_each_bank()
> log_error()
>
> if (smca)
> log_error_from_de_regs()
>
> Purely pseudocode of course.
>
> And there's no need to go and look whether the error is a deferred error or
> whatnot. In the majority of the cases it will be because we're in the #DF
> handler and it better be raised for a #DF.
>
If we do as above then we can possibly log the same deferred error twice. So
even if we log every error we should still check if the error is deferred to decide
whether or not to log what's in the DE* registers.
> But even if we see something else, we should simply log it. As long as it is a
> valid error signature there's nothing wrong with us logging it and clearing the
> regs. The earlier we do so, the lower the probability for setting the overflow
> bit.
>
Okay.
Thanks,
Yazen