Re: [PATCH 1/3] x86/MCE/AMD, EDAC/mce_amd: Add new SMCA bank types.

From: Borislav Petkov
Date: Tue May 11 2021 - 13:27:24 EST


On Tue, May 11, 2021 at 08:55:36PM +0530, Naveen Krishna Chatradhi wrote:
> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
> index e486f96b3cb3..055f3a0acf5e 100644
> --- a/arch/x86/kernel/cpu/mce/amd.c
> +++ b/arch/x86/kernel/cpu/mce/amd.c
> @@ -90,6 +90,7 @@ static struct smca_bank_name smca_names[] = {
> [SMCA_CS_V2] = { "coherent_slave", "Coherent Slave" },
> [SMCA_PIE] = { "pie", "Power, Interrupts, etc." },
> [SMCA_UMC] = { "umc", "Unified Memory Controller" },
> + [SMCA_UMC_V2] = { "umc_v2", "Unified Memory Controller" },

So this is called "umc_v2" but the other V2 FUs's strings are the same.
Why?

Also, if you're going to repeat strings, you can just as well group all
those which are the same this way:

[ SMCA_UMC ... SMCA_UMC_V2 ] = { "umc", "Unified Memory Controller" },

and do that for all which have V1 and V2.

I mean, gcc is smart enough to do that behind the scenes for identical
strings but you should do that in C too.

> diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
> index 5dd905a3f30c..5515fd9336b1 100644
> --- a/drivers/edac/mce_amd.c
> +++ b/drivers/edac/mce_amd.c
> @@ -323,6 +323,21 @@ static const char * const smca_umc_mce_desc[] = {
> "AES SRAM ECC error",
> };
>
> +static const char * const smca_umc2_mce_desc[] = {

Ok, gcc reuses the identical string pointers from smca_umc_mce_desc[] so
we should be ok wrt duplication.

> + "DRAM ECC error",
> + "Data poison error",
> + "SDP parity error",
> + "Reserved",
> + "Address/Command parity error",
> + "Write data parity error",
> + "DCQ SRAM ECC error",
> + "Reserved",
> + "Read data parity error",
> + "Rdb SRAM ECC error",
> + "RdRsp SRAM ECC error",
> + "LM32 MP errors",
> +};

...


> +static const char * const smca_xgmipcs_mce_desc[] = {
> + "DataLossErr",
> + "TrainingErr",
> + "FlowCtrlAckErr",
> + "RxFifoUnderflowErr",
> + "RxFifoOverflowErr",
> + "CRCErr",
> + "BERExceededErr",
> + "TxVcidDataErr",
> + "ReplayBufParityErr",
> + "DataParityErr",
> + "ReplayFifoOverflowErr",
> + "ReplayFIfoUnderflowErr",
> + "ElasticFifoOverflowErr",
> + "DeskewErr",
> + "FlowCtrlCRCErr",
> + "DataStartupLimitErr",
> + "FCInitTimeoutErr",
> + "RecoveryTimeoutErr",
> + "ReadySerialTimeoutErr",
> + "ReadySerialAttemptErr",
> + "RecoveryAttemptErr",
> + "RecoveryRelockAttemptErr",
> + "ReplayAttemptErr",
> + "SyncHdrErr",
> + "TxReplayTimeoutErr",
> + "RxReplayTimeoutErr",
> + "LinkSubTxTimeoutErr",
> + "LinkSubRxTimeoutErr",
> + "RxCMDPktErr",

What happened to those and why aren't they proper words like the other
error descriptions?

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette