Re: [PATCH] stop_machine() vs. synchronous IPI send deadlock

From: Kirill Korotaev
Date: Wed Nov 09 2005 - 12:32:42 EST


Sorry, hunk with corresponding preempt_enable was lost, sending patch again.

This patch fixes deadlock of stop_machine() vs. synchronous IPI send.
The problem is that stop_machine() disables interrupts before disabling preemption on other CPUs. So if another CPU is preempted and then calls something like flush_tlb_all() it will deadlock with CPU doing stop_machine() and which can't process IPI due to disabled IRQs.

I changed stop_machine() to do the same things exactly as it does on other CPUs, i.e. it should disable preemption first on _all_ CPUs including itself and only after that disable IRQs.

Signed-Off-By: Kirill Korotaev <dev@xxxxx>

Kirill --- ./kernel/stop_machine.c.stpmach 2005-11-01 12:06:03.000000000 +0300
+++ ./kernel/stop_machine.c 2005-11-09 20:38:23.000000000 +0300
@@ -114,13 +114,12 @@ static int stop_machine(void)
return ret;
}

- /* Don't schedule us away at this point, please. */
- local_irq_disable();
-
/* Now they are all started, make them hold the CPUs, ready. */
+ preempt_disable();
stopmachine_set_state(STOPMACHINE_PREPARE);

/* Make them disable irqs. */
+ local_irq_disable();
stopmachine_set_state(STOPMACHINE_DISABLE_IRQ);

return 0;
@@ -130,6 +129,7 @@ static void restart_machine(void)
{
stopmachine_set_state(STOPMACHINE_EXIT);
local_irq_enable();
+ preempt_enable_no_resched();
}

struct stop_machine_data