Re: mce.c related WARNING: at kernel/timer.c:983 del_timer_sync

From: Andi Kleen
Date: Tue Mar 08 2011 - 13:50:42 EST


> >
> > But, the actual reason is likely some MCE parameter change at boot causing
> > mce_restart() which in turn calls on_each_cpu mce_cpu_restart() which calls
> > del_timer_sync().
>
> Seems we found a real bug.

I don't think it's a real bug actually because the timer cannot run at
the same time in this state. It's an interrupt which runs with irq disabled
Really the only case where it could lead to deadlock is when the timer
runs with irqs on and the other interrupt with the del_timer_sync
interrupts it. So most likely your new WARN_ON() is catching
lots of innocent code.

That said I don't think we need the del_timer_sync in mce.c either
for the same reason. The timer is always on the
same CPU, so it cannot run in parallel.

Remove del_timer_sync()s in mce.c

All the del_timers happen on the same CPUs as the actual timers, so
the timer handlers cannot run at the same time. Replace them
with plain del_timer()s.

Signed-off-by: Andi Kleen <ak@xxxxxxxxxxxxxxx>

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index d916183..ba7058a 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1774,7 +1774,7 @@ static int mce_resume(struct sys_device *dev)

static void mce_cpu_restart(void *data)
{
- del_timer_sync(&__get_cpu_var(mce_timer));
+ del_timer(&__get_cpu_var(mce_timer));
if (!mce_available(__this_cpu_ptr(&cpu_info)))
return;
__mcheck_cpu_init_generic();
@@ -1793,7 +1793,7 @@ static void mce_disable_ce(void *all)
if (!mce_available(__this_cpu_ptr(&cpu_info)))
return;
if (all)
- del_timer_sync(&__get_cpu_var(mce_timer));
+ del_timer(&__get_cpu_var(mce_timer));
cmci_clear();
}

@@ -2075,7 +2075,7 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu)
break;
case CPU_DOWN_PREPARE:
case CPU_DOWN_PREPARE_FROZEN:
- del_timer_sync(t);
+ del_timer(t);
smp_call_function_single(cpu, mce_disable_cpu, &action, 1);
break;
case CPU_DOWN_FAILED:

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/