[PATCH] Make sure timers have migrated before killingmigration_thread

From: Amit K. Arora
Date: Wed May 19 2010 - 05:07:24 EST

Problem : In a stress test where some heavy tests were running along with
regular CPU offlining and onlining, a hang was observed. The system seems to
be hung at a point where migration_call() tries to kill the migration_thread
of the dying CPU, which just got moved to the current CPU. This migration
thread does not get a chance to run (and die) since rt_throttled is set to 1
on current, and it doesn't get cleared as the hrtimer which is supposed to
reset the rt bandwidth (sched_rt_period_timer) is tied to the CPU being

Solution : This patch pushes the killing of migration thread to "CPU_POST_DEAD"
event. By then all the timers (including sched_rt_period_timer) should have got
migrated (along with other callbacks).

Alternate Solution considered : Another option considered was to
increase the priority of the hrtimer cpu offline notifier, such that it
gets to run before scheduler's migration cpu offline notifier. In this
way we are sure that the timers will get migrated before migration_call
tries to kill migration_thread. But, this can have some non-obvious
implications, suggested Srivatsa.

Testing : Without the patch the stress tests didn't last for even 12
hours. And yes, the problem was reproducible. With the patch applied the
tests ran successfully for more than 48 hours.

Amit Arora

Signed-off-by: Amit Arora <aarora@xxxxxxxxxx>
Signed-off-by: Gautham R Shenoy <ego@xxxxxxxxxx>
diff -Nuarp linux-2.6.34.org/kernel/sched.c linux-2.6.34/kernel/sched.c
--- linux-2.6.34.org/kernel/sched.c 2010-05-18 22:56:21.000000000 -0700
+++ linux-2.6.34/kernel/sched.c 2010-05-18 22:58:31.000000000 -0700
@@ -5942,14 +5942,26 @@ migration_call(struct notifier_block *nf
cpu_rq(cpu)->migration_thread = NULL;

+ /*
+ Bring the migration thread down in CPU_POST_DEAD event,
+ since the timers should have got migrated by now and thus
+ we should not see a deadlock between trying to kill the
+ migration thread and the sched_rt_period_timer.
+ */
+ cpuset_lock();
+ rq = cpu_rq(cpu);
+ kthread_stop(rq->migration_thread);
+ put_task_struct(rq->migration_thread);
+ rq->migration_thread = NULL;
+ cpuset_unlock();
+ break;
case CPU_DEAD:
cpuset_lock(); /* around calls to cpuset_cpus_allowed_lock() */
rq = cpu_rq(cpu);
- kthread_stop(rq->migration_thread);
- put_task_struct(rq->migration_thread);
- rq->migration_thread = NULL;
/* Idle task back to normal (off runqueue, low prio) */
