Re: [PATCH v2 1/2] cper, apei, mce: Pass x86 CPER through the MCA handling chain

From: Punit Agrawal
Date: Mon Aug 31 2020 - 01:28:07 EST


Hi Smita,

A couple of comments below -

Smita Koralahalli <Smita.KoralahalliChannabasappa@xxxxxxx> writes:

> Linux Kernel uses ACPI Boot Error Record Table (BERT) to report fatal
> errors that occurred in a previous boot. The MCA errors in the BERT are
> reported using the x86 Processor Error Common Platform Error Record (CPER)
> format. Currently, the record prints out the raw MSR values and AMD relies
> on the raw record to provide MCA information.
>
> Extract the raw MSR values of MCA registers from the BERT and feed it into
> the standard mce_log() function through the existing x86/MCA RAS
> infrastructure. This will result in better decoding from the EDAC MCE
> decoder or the default notifier.
>
> The implementation is SMCA specific as the raw MCA register values are
> given in the register offset order of the MCAX address space.
>
> Reported-by: kernel test robot <lkp@xxxxxxxxx>
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@xxxxxxx>
> ---

[...]


> diff --git a/drivers/firmware/efi/cper-x86.c b/drivers/firmware/efi/cper-x86.c
> index 2531de49f56c..374b8e18552a 100644
> --- a/drivers/firmware/efi/cper-x86.c
> +++ b/drivers/firmware/efi/cper-x86.c
> @@ -1,7 +1,7 @@
> // SPDX-License-Identifier: GPL-2.0
> // Copyright (C) 2018, Advanced Micro Devices, Inc.
>
> -#include <linux/cper.h>

Why is the include dropped? AFAICT, the definitions from there are still
being used after this patch.

> +#include <acpi/apei.h>
>
> /*
> * We don't need a "CPER_IA" prefix since these are all locally defined.
> @@ -347,9 +347,11 @@ void cper_print_proc_ia(const char *pfx, const struct cper_sec_proc_ia *proc)
> ctx_info->mm_reg_addr);
> }
>
> - printk("%sRegister Array:\n", newpfx);
> - print_hex_dump(newpfx, "", DUMP_PREFIX_OFFSET, 16, groupsize,
> - (ctx_info + 1), ctx_info->reg_arr_size, 0);
> + if (arch_apei_report_x86_error(ctx_info, proc->lapic_id)) {
> + printk("%sRegister Array:\n", newpfx);
> + print_hex_dump(newpfx, "", DUMP_PREFIX_OFFSET, 16, groupsize,
> + (ctx_info + 1), ctx_info->reg_arr_size, 0);
> + }
>
> ctx_info = (struct cper_ia_proc_ctx *)((long)ctx_info + size);
> }
> diff --git a/include/acpi/apei.h b/include/acpi/apei.h
> index 680f80960c3d..44d4d08acce0 100644
> --- a/include/acpi/apei.h
> +++ b/include/acpi/apei.h
> @@ -33,8 +33,15 @@ extern bool ghes_disable;
>
> #ifdef CONFIG_ACPI_APEI
> void __init acpi_hest_init(void);
> +int arch_apei_report_x86_error(struct cper_ia_proc_ctx *ctx_info,
> + u64 lapic_id);
> #else
> static inline void acpi_hest_init(void) { return; }
> +static inline int arch_apei_report_x86_error(struct cper_ia_proc_ctx *ctx_info,
> + u64 lapic_id)
> +{
> + return -EINVAL;
> +}
> #endif

Adding the declaration to this include violates the separation of
generic and architecture specific code.

Can this be moved to the appropriate architecture specific header?
Perhaps arch/x86/include/asm/apei.h.

> typedef int (*apei_hest_func_t)(struct acpi_hest_header *hest_hdr, void *data);
> @@ -51,6 +58,8 @@ int erst_clear(u64 record_id);
>
> int arch_apei_enable_cmcff(struct acpi_hest_header *hest_hdr, void *data);
> void arch_apei_report_mem_error(int sev, struct cper_sec_mem_err *mem_err);
> +int arch_apei_report_x86_error(struct cper_ia_proc_ctx *ctx_info,
> + u64 lapic_id);


Why is the additional declaration needed?

Thanks,
Punit

>
> #endif
> #endif