Re: [PATCH v2] smp: Document preemption and stop_machine() mutual exclusion
From: Joel Fernandes
Date: Mon Jul 07 2025 - 10:20:10 EST
On Mon, Jul 07, 2025 at 09:50:50AM +0200, Peter Zijlstra wrote:
> On Sat, Jul 05, 2025 at 01:23:27PM -0400, Joel Fernandes wrote:
> > Recently while revising RCU's cpu online checks, there was some discussion
> > around how IPIs synchronize with hotplug.
> >
> > Add comments explaining how preemption disable creates mutual exclusion with
> > CPU hotplug's stop_machine mechanism. The key insight is that stop_machine()
> > atomically updates CPU masks and flushes IPIs with interrupts disabled, and
> > cannot proceed while any CPU (including the IPI sender) has preemption
> > disabled.
>
> I'm very conflicted on this. While the added comments aren't wrong,
> they're not quite accurate either. Stop_machine doesn't wait for people
> to enable preemption as such.
You're right. I actually did not mean to describe how stop_machine is
supposed to work. Indeed, this "trick" for IPI sending safety is more of a
dependency on stop machine I suppose.
> Fundamentally there seems to be a misconception around what stop machine
> is and how it works, and I don't feel these comments make things better.
Sure, but again I am not intending to discuss how stop machine works in this
patch. That's more ambitious.
> Basically, stop-machine (and stop_one_cpu(), stop_two_cpus()) use the
> stopper task, a task running at the ultimate priority; if it is
> runnable, it will run.
>
> Stop-machine simply wakes all the stopper tasks and co-ordinates them to
> literally stop the machine. All CPUs have the stopper task scheduled and
> then they go sit in a spin-loop driven state machine with IRQs disabled.
Yep.
> There really isn't anything magical about any of this.
So I modified the original patch I sent mainly removing the comments in
stop-machine code and reducing the wordiness. Hope this looks good to you now!
---8<-----------------------
From: Joel Fernandes <joelagnelf@xxxxxxxxxx>
Subject: [PATCH] smp: Document preemption and stop_machine() mutual exclusion
Recently while revising RCU's cpu online checks, there was some discussion
around how IPIs synchronize with hotplug.
Add comments explaining how preemption disable creates mutual exclusion with
CPU hotplug's stop_machine mechanism. The key insight is that stop_machine()
atomically updates CPU masks and flushes IPIs with interrupts disabled, and
cannot proceed while any CPU (including the IPI sender) has preemption
disabled.
Cc: Andrea Righi <arighi@xxxxxxxxxx>
Cc: Paul E. McKenney <paulmck@xxxxxxxxxx>
Cc: Frederic Weisbecker <frederic@xxxxxxxxxx>
Cc: rcu@xxxxxxxxxxxxxxx
Acked-by: Paul E. McKenney <paulmck@xxxxxxxxxx>
Co-developed-by: Frederic Weisbecker <frederic@xxxxxxxxxx>
Signed-off-by: Joel Fernandes <joelagnelf@xxxxxxxxxx>
---
I am leaving in Paul's Ack but Paul please let me know if there is a concern!
kernel/smp.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/kernel/smp.c b/kernel/smp.c
index 974f3a3962e8..957959031063 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -93,6 +93,9 @@ int smpcfd_dying_cpu(unsigned int cpu)
* explicitly (without waiting for the IPIs to arrive), to
* ensure that the outgoing CPU doesn't go offline with work
* still pending.
+ *
+ * This runs with interrupts disabled inside the stopper task invoked
+ * by stop_machine(), ensuring CPU offlining and IPI flushing are atomic.
*/
__flush_smp_call_function_queue(false);
irq_work_run();
@@ -418,6 +421,10 @@ void __smp_call_single_queue(int cpu, struct llist_node *node)
*/
static int generic_exec_single(int cpu, call_single_data_t *csd)
{
+ /*
+ * Preemption already disabled here so stopper cannot run on this CPU,
+ * ensuring mutual exclusion with CPU offlining and last IPI flush.
+ */
if (cpu == smp_processor_id()) {
smp_call_func_t func = csd->func;
void *info = csd->info;
@@ -638,8 +645,10 @@ int smp_call_function_single(int cpu, smp_call_func_t func, void *info,
int err;
/*
- * prevent preemption and reschedule on another processor,
- * as well as CPU removal
+ * Prevent preemption and reschedule on another processor, as well as
+ * CPU removal. Also preempt_disable() prevents stopper from running on
+ * this CPU, thus providing atomicity between the cpu_online() check
+ * and IPI sending ensuring IPI is not missed by CPU going offline.
*/
this_cpu = get_cpu();
--
2.34.1