[PATCH -v2 -tip] fix race between stop_two_cpus and stop_cpus

From: Rik van Riel
Date: Fri Nov 01 2013 - 10:42:35 EST


Subject: fix race between stop_two_cpus and stop_cpus

There is a race between stop_two_cpus, and the global stop_cpus.

It is possible for two CPUs to get their stopper functions queued
"backwards" from one another, resulting in the stopper threads
getting stuck, and the system hanging. This can happen because
queuing up stoppers is not synchronized.

This patch adds synchronization between stop_cpus (a rare operation),
and stop_two_cpus.

Signed-off-by: Rik van Riel <riel@xxxxxxxxxx>
---
v2: use lglock, as suggested by Peter & Mel, thanks to both of you
for insisting on nicer code :)

kernel/stop_machine.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 32a6c44..c5c1af1 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -20,6 +20,7 @@
#include <linux/kallsyms.h>
#include <linux/smpboot.h>
#include <linux/atomic.h>
+#include <linux/lglock.h>

/*
* Structure to determine completion condition and record errors. May
@@ -43,6 +44,14 @@ static DEFINE_PER_CPU(struct cpu_stopper, cpu_stopper);
static DEFINE_PER_CPU(struct task_struct *, cpu_stopper_task);
static bool stop_machine_initialized = false;

+/*
+ * Avoids a race between stop_two_cpus and global stop_cpus, where
+ * the stoppers could get queued up in reverse order, leading to
+ * system deadlock. Using an lglock means stop_two_cpus remains
+ * relatively cheap.
+ */
+DEFINE_STATIC_LGLOCK(stop_cpus_lock);
+
static void cpu_stop_init_done(struct cpu_stop_done *done, unsigned int nr_todo)
{
memset(done, 0, sizeof(*done));
@@ -268,8 +277,10 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, cpu_stop_fn_t fn, void *
*/
call_cpu = min(cpu1, cpu2);

+ lg_local_lock(&stop_cpus_lock);
smp_call_function_single(call_cpu, &irq_cpu_stop_queue_work,
&call_args, 0);
+ lg_local_unlock(&stop_cpus_lock);

wait_for_completion(&done.completion);
return done.executed ? done.ret : -ENOENT;
@@ -319,10 +330,10 @@ static void queue_stop_cpus_work(const struct cpumask *cpumask,
* preempted by a stopper which might wait for other stoppers
* to enter @fn which can lead to deadlock.
*/
- preempt_disable();
+ lg_global_lock(&stop_cpus_lock);
for_each_cpu(cpu, cpumask)
cpu_stop_queue_work(cpu, &per_cpu(stop_cpus_work, cpu));
- preempt_enable();
+ lg_global_unlock(&stop_cpus_lock);
}

static int __stop_cpus(const struct cpumask *cpumask,
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/