Re: [PATCH v4 02/22] x86/mce: Restore poll settings after storm subsides

From: Nikolay Borisov
Date: Wed Jun 25 2025 - 09:29:27 EST




On 6/24/25 17:15, Yazen Ghannam wrote:
Users can disable MCA polling by setting the "ignore_ce" parameter or by
setting "check_interval=0". This tells the kernel to *not* start the MCE
timer on a CPU.

If the user did not disable CMCI, then storms can occur. When these
happen, the MCE timer will be started with a fixed interval. After the
storm subsides, the timer's next interval is set to check_interval.

I think the subject of the patch doesn't do justice to the patch content. In fact, what this change does is ensure the timer function honors CE handling being disabled either via ignore_ce or check_interval being 0 when a CMCI storm subsides. So a subject along the lines of:

"Ensure user settings are considered when CMCI storm subsides" or something like that is more descriptive of what you are doing.

At the very least you are not restoring anything, because even without this patch when the storm subsided you'd start the timer with a value of 'iv'.


This disregards the user's input through "ignore_ce" and
"check_interval". Furthermore, if "check_interval=0", then the new timer
will run faster than expected.

Create a new helper to check these conditions and use it when a CMCI
storm ends.

Fixes: 7eae17c4add5 ("x86/mce: Add per-bank CMCI storm mitigation")
Signed-off-by: Yazen Ghannam <yazen.ghannam@xxxxxxx>
Cc: stable@xxxxxxxxxxxxxxx
---

Notes:
Link:
https://lore.kernel.org/r/20250415-wip-mca-updates-v3-17-8ffd9eb4aa56@xxxxxxx
v3->v4:
* Update commit message.
* Move to beginning of set.
* Note: Polling vs thresholding use case updates not yet addressed.
v2->v3:
* New in v3.

arch/x86/kernel/cpu/mce/core.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 07d61937427f..ae2e2d8ec99b 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1740,6 +1740,11 @@ static void mc_poll_banks_default(void)
void (*mc_poll_banks)(void) = mc_poll_banks_default;
+static bool should_enable_timer(unsigned long iv)
+{
+ return !mca_cfg.ignore_ce && iv;
+}
+
static void mce_timer_fn(struct timer_list *t)
{
struct timer_list *cpu_t = this_cpu_ptr(&mce_timer);
@@ -1763,7 +1768,7 @@ static void mce_timer_fn(struct timer_list *t)
if (mce_get_storm_mode()) {
__start_timer(t, HZ);
- } else {
+ } else if (should_enable_timer(iv)) {
__this_cpu_write(mce_next_interval, iv);
__start_timer(t, iv);
}
@@ -2156,7 +2161,7 @@ static void mce_start_timer(struct timer_list *t)
{
unsigned long iv = check_interval * HZ;
- if (mca_cfg.ignore_ce || !iv)
+ if (!should_enable_timer(iv))
return;
this_cpu_write(mce_next_interval, iv);