Re: [PATCH 2/2] x86/mce: Add per-bank CMCI storm mitigation

From: Borislav Petkov
Date: Mon Mar 07 2022 - 08:31:29 EST


On Thu, Feb 17, 2022 at 09:36:50AM -0800, Luck, Tony wrote:
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 4f9abb66520d..1f3e7c074182 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -714,6 +714,8 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
> barrier();
> m.status = mce_rdmsrl(mca_msr_reg(i, MCA_STATUS));
>
> + mce_intel_storm_tracker(i, m.status);

Why is this called before the VALID bit check?

Because you want to still run the tracker on each polling - not only
when it sees a valid error?

> diff --git a/arch/x86/kernel/cpu/mce/intel.c b/arch/x86/kernel/cpu/mce/intel.c
> index cee9d989f791..2ed5634ec277 100644
> --- a/arch/x86/kernel/cpu/mce/intel.c
> +++ b/arch/x86/kernel/cpu/mce/intel.c
> @@ -47,8 +47,48 @@ static DEFINE_PER_CPU(mce_banks_t, mce_banks_owned);
> */
> static DEFINE_RAW_SPINLOCK(cmci_discover_lock);
>
> +/*
> + * CMCI storm tracking state
> + */

Those could use some comments explaining what is tracking what:

> +static DEFINE_PER_CPU(int, stormy_bank_count);
> +static DEFINE_PER_CPU(u64 [MAX_NR_BANKS], bank_history);
> +static DEFINE_PER_CPU(bool [MAX_NR_BANKS], bank_storm);

AFAICT, this says whether a bank is in storm mode?

> +static DEFINE_PER_CPU(unsigned long [MAX_NR_BANKS], bank_time_stamp);

This looks like it collects the jiffies when the bank was looked at in
the storm tracker.

> +static int cmci_threshold[MAX_NR_BANKS];
> +
> #define CMCI_THRESHOLD 1
>
> +/*
> + * High threshold to limit CMCI rate during storms. Max supported is
> + * 0x7FFF. Use this slightly smaller value so it has a distinctive
> + * signature when some asks "Why am I not seeing all corrected errors?"
> + */
> +#define CMCI_STORM_THRESHOLD 0x7FED

Why is a "threshold" in hex?

> +
> +/*
> + * How many errors within the history buffer mark the start of a storm
> + */
> +#define STORM_BEGIN 5

That looks like a STORM_BEGIN_THRESHOLD to me.

> +
> +/*
> + * How many polls of machine check bank without an error before declaring
> + * the storm is over
> + */
> +#define STORM_END 30

Similarly:

STORM_END_POLL_THRESHOLD

> +
> +/*
> + * If there is no poll data for a bank for this amount of time, just
> + * discard the history.
> + */
> +#define STORM_INTERVAL (1 * HZ)

That looks unused.

> +static void cmci_storm_begin(int bank)
> +{
> + __set_bit(bank, this_cpu_ptr(mce_poll_banks));
> + this_cpu_write(bank_storm[bank], true);
> + if (this_cpu_inc_return(stormy_bank_count) == 1)

s/ == 1//

> + mce_timer_kick(true);
> +}
> +
> +static void cmci_storm_end(int bank)
> +{
> + __clear_bit(bank, this_cpu_ptr(mce_poll_banks));
> + this_cpu_write(bank_history[bank], 0ull);
> + this_cpu_write(bank_storm[bank], false);
> + if (this_cpu_dec_return(stormy_bank_count) == 0)

if (!...

> + mce_timer_kick(false);
> +}
> +
> +void mce_intel_storm_tracker(int bank, u64 status)

Function name needs a verb.

> +{
> + unsigned long now = jiffies, delta;
> + unsigned int shift;
> + u64 history;
> +
> + delta = now - this_cpu_read(bank_time_stamp[bank]);
> + shift = this_cpu_read(bank_storm[bank]) ? 1 : (delta + HZBITS) / HZBITS;

Do

shift = 1;

on function entry to simplify this assignment.

Also, I'm having trouble with this shift calculation. The laptop here has
HZ=250. Let's say delta is 2000 jiffies.

So if this bank wasn't in storm mode, I'd have

shift = (2000 + (250 / 64)) / (250 / 64) = 513

...

Aaaha, so only when the diff is < 250 in my case, i.e., it polls the
same bank within a second, only then it would shift the history. I.e.,
what you mean with that "The 64 bit width corresponds to about one
second."

> + history = (shift < 64) ? this_cpu_read(bank_history[bank]) << shift : 0;
> + this_cpu_write(bank_time_stamp[bank], now);
> +
> + if ((status & (MCI_STATUS_VAL | MCI_STATUS_UC)) == MCI_STATUS_VAL)
> + history |= 1;
> + this_cpu_write(bank_history[bank], history);
> +
> + if (this_cpu_read(bank_storm[bank])) {
> + if (history & GENMASK_ULL(STORM_END - 1, 0))
> + return;

Aha, under STORM_END polls you don't declare the storm as being over.

> + pr_notice("CPU%d BANK%d CMCI storm subsided\n", smp_processor_id(), bank);
> + cmci_set_threshold(bank, cmci_threshold[bank]);
> + cmci_storm_end(bank);
> + } else {
> + if (hweight64(history) < STORM_BEGIN)
> + return;

Aha, so you need STORM_BEGIN errors within the last second to cause the
storm polling. Ok I guess.

So all in all I can't find anything eeewy in this - it would need to
have a lot more documentation, though, as this is not the most trivial
thing to stare at.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette