On Wed, Apr 23 2008, Mark Lord wrote:..Jens Axboe wrote:On Wed, Apr 23 2008, Mark Lord wrote:..
..
The second bug, is that for the halt case at least,I'm guessing there's a reason it doesn't pass '1' as the last argument,
nobody waits for the other CPU to actually halt
before continuing.. so we sometimes enter the shutdown
code while other CPUs are still active.
This causes some machines to hang at shutdown,
unless CPU_HOTPLUG is configured and takes them offline
before we get here.
because that would fix that issue?
Undoubtedly -- perhaps the called CPU halts, and therefore cannot reply. :)
Uhm yes, I guess stop_this_cpu() does exactly what the name implies :-)
But some kind of pre-halt ack, perhaps plus a short delay by the caller
after receipt of the ack, would probably suffice to kill that bug.
But I really haven't studied this code enough to know,
other than that it historically has been a sticky area
to poke around in.
Something like this will close the window to right up until the point
where the other CPUs have 'almost' called halt().
diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index 5398385..94ec9bf 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -155,8 +155,9 @@ static void stop_this_cpu(void *dummy)
/*
* Remove this CPU:
*/
- cpu_clear(smp_processor_id(), cpu_online_map);
disable_local_APIC();
+ cpu_clear(smp_processor_id(), cpu_online_map);
+ smp_wmb();
if (hlt_works(smp_processor_id()))
for (;;) halt();
for (;;);
@@ -175,6 +176,12 @@ static void native_smp_send_stop(void)
local_irq_save(flags);
smp_call_function(stop_this_cpu, NULL, 0, 0);
+
+ while (cpus_weight(cpu_online_map) > 1) {
+ cpu_relax();
+ smp_rmb();
+ }
+
disable_local_APIC();
local_irq_restore(flags);
}