[PATCH] clockevents: Fix cpu_down() race for hrtimer based broadcasting

From: Preeti U Murthy
Date: Mon Mar 30 2015 - 05:29:19 EST


It was found when doing a hotplug stress test on POWER, that the
machine either hit softlockups or rcu_sched stall warnings. The
issue was traced to commit:

7cba160ad789 ("powernv/cpuidle: Redesign idle states management")

which exposed the cpu_down() race with hrtimer based broadcast mode:

5d1638acb9f6 ("tick: Introduce hrtimer based broadcast")

The race is the following:

Assume CPU1 is the CPU which holds the hrtimer broadcasting duty
before it is taken down.

CPU0 CPU1

cpu_down() take_cpu_down()
disable_interrupts()

cpu_die()

while (CPU1 != CPU_DEAD) {
msleep(100);
switch_to_idle();
stop_cpu_timer();
schedule_broadcast();
}

tick_cleanup_cpu_dead()
take_over_broadcast()

So after CPU1 disabled interrupts it cannot handle the broadcast
hrtimer anymore, so CPU0 will be stuck forever.

Fix this by explicitly taking over broadcast duty before cpu_die().

This is a temporary workaround. What we really want is a callback
in the clockevent device which allows us to do that from the dying
CPU by pushing the hrtimer onto a different cpu. That might involve
an IPI and is definitely more complex than this immediate fix.

Changelog was picked up from:

https://lkml.org/lkml/2015/2/16/213

Suggested-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Tested-by: Nicolas Pitre <nico@xxxxxxxxxx>
Signed-off-by: Preeti U. Murthy <preeti@xxxxxxxxxxxxxxxxxx>
Cc: linuxppc-dev@xxxxxxxxxxxxxxxx
Cc: mpe@xxxxxxxxxxxxxx
Cc: nicolas.pitre@xxxxxxxxxx
Cc: peterz@xxxxxxxxxxxxx
Cc: rjw@xxxxxxxxxxxxx
Fixes: http://linuxppc.10917.n7.nabble.com/offlining-cpus-breakage-td88619.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/