Re: [PATCH v2 2/2] x86/mce: Add messages to describe panic machine errors on AMD's MCEs grading

From: Yazen Ghannam
Date: Tue Apr 05 2022 - 16:17:15 EST


On Thu, Mar 31, 2022 at 11:38:50AM -0500, Carlos Bilbao wrote:

...

> static noinstr int mce_severity_amd(struct mce *m, struct pt_regs *regs, char **msg, bool is_excp)
> {
> int ret;
> + char *panic_msg;

Order variable lines from longest to shortest.

And the pointer should be initiliazed to NULL like Mike said also.

>
> /*
> * Default return value: Action required, the error must be handled
> @@ -316,6 +317,7 @@ static noinstr int mce_severity_amd(struct mce *m, struct pt_regs *regs, char **
>
> /* Processor Context Corrupt, no need to fumble too much, die! */
> if (m->status & MCI_STATUS_PCC) {
> + panic_msg = "Processor Context Corrupt";
> ret = MCE_PANIC_SEVERITY;
> goto amd_severity;
> }
> @@ -339,16 +341,21 @@ static noinstr int mce_severity_amd(struct mce *m, struct pt_regs *regs, char **
>
> if (((m->status & MCI_STATUS_OVER) && !mce_flags.overflow_recov)
> || !mce_flags.succor) {
> + panic_msg = "Uncorrected unrecoverable error";

So these two cases should definitely be separate. One is "Overflowed
uncorrected error without MCA Overflow Recovery", and the other is
"Uncorrected error without MCA Recovery".

> ret = MCE_PANIC_SEVERITY;
> goto amd_severity;
> }
>
> if (error_context(m, regs) == IN_KERNEL) {
> + panic_msg = "Uncorrected error in kernel context";

This should be "Uncorrected unrecoverable error in kernel context". There is
the IN_KERNEL_RECOV error context for a recoverable error in kernel context.

Thanks,
Yazen