Re: [RFC] [x86]: abort secondary cpu bringup gracefully

From: Peter Zijlstra
Date: Sat May 12 2012 - 13:39:24 EST


On Sat, 2012-05-12 at 21:32 +0200, Igor Mammedov wrote:

> @@ -232,12 +233,36 @@ static void __cpuinit smp_callin(void)
> set_cpu_sibling_map(raw_smp_processor_id());
> wmb();
>
> - notify_cpu_starting(cpuid);
> -
> /*
> * Allow the master to continue.
> */
> cpumask_set_cpu(cpuid, cpu_callin_mask);
> +
> + /*
> + * Wait for master to continue.
> + */
> + for (timeout = 0; timeout < 50000; timeout++) {
> + if (cpumask_test_cpu(cpuid, cpu_may_complete_boot_mask))
> + break;
> +
> + if (!cpumask_test_cpu(cpuid, cpu_callout_mask))
> + break;
> +
> + udelay(100);
> + }
> +
> + if (!cpumask_test_cpu(cpuid, cpu_may_complete_boot_mask))
> + goto die;
> +
> + notify_cpu_starting(cpuid);

Its absolutely broken to call CPU_STARTING after the master cpu is told
to continue. Once it returns from cpu_up() it assumes the secondary is
completely initialized and ready to run.

> + return;
> +
> +die:

You've forgotten to clean up the bits set by set_cpu_sibling_map().

> + /* was set by cpu_init() */
> + cpumask_clear_cpu(smp_processor_id(), cpu_initialized_mask);
> + cpumask_clear_cpu(smp_processor_id(), cpu_callin_mask);
> + clear_local_APIC();
> + play_dead();
> }
>
> /*
> @@ -774,6 +799,8 @@ do_rest:
> }
>
> if (cpumask_test_cpu(cpu, cpu_callin_mask)) {
> + /* Signal AP that it may continue to boot */
> + cpumask_set_cpu(cpu, cpu_may_complete_boot_mask);
> print_cpu_msr(&cpu_data(cpu));
> pr_debug("CPU%d: has booted.\n", cpu);
> } else {
> @@ -1250,6 +1277,7 @@ static void __ref remove_cpu_from_maps(int cpu)
> cpumask_clear_cpu(cpu, cpu_callin_mask);
> /* was set by cpu_init() */
> cpumask_clear_cpu(cpu, cpu_initialized_mask);
> + cpumask_clear_cpu(cpu, cpu_may_complete_boot_mask);
> numa_remove_cpu(cpu);
> }
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/