Re: [PATCH v15 03/12] cpu/hotplug: Add dynamic parallel bringup states before CPUHP_BRINGUP_CPU

From: David Woodhouse
Date: Tue Mar 21 2023 - 15:15:04 EST


On Mon, 2023-03-20 at 15:30 +0100, Thomas Gleixner wrote:
>
> This causes a subtle issue. The bringup loop above moves all CPUs to
> cpuhp_state == CPUHP_BP_PARALLEL_DYN_END. So the serial bootup will
> start from there and bring them fully up.
>
> Now if a bringup fails, then the rollback will only go back down to
> CPUHP_BP_PARALLEL_DYN_END, which means that the control CPU won't do any
> cleanups below CPUHP_BP_PARALLEL_DYN_END.
>
> That 'fail' is a common case for SMT soft disable via the 'nosmt'
> command line parameter. Due to the marvelous MCE broadcast 'feature' we
> need to bringup the SMT siblings at least to the CPUHP_AP_ONLINE_IDLE
> state once and then roll them back.
>
> While this is not necessarily a fatal problem, it's changing behaviour
> and with quite some of the details hidden in the (then not issued)
> teardown callbacks might cause some hard to decode subtle surprises.
>
> So that second for_each_present_cpu() loop needs to check the return
> value of cpu_up() and issue a full rollback to CPUHP_OFFLINE in case of
> fail.

@@ -1524,8 +1531,22 @@ void bringup_nonboot_cpus(unsigned int setup_max_cpus)
for_each_present_cpu(cpu) {
if (num_online_cpus() >= setup_max_cpus)
break;
- if (!cpu_online(cpu))
- cpu_up(cpu, CPUHP_ONLINE);
+ if (!cpu_online(cpu)) {
+ int ret = cpu_up(cpu, CPUHP_ONLINE);
+
+ /*
+ * For the parallel bringup case, roll all the way back
+ * to CPUHP_OFFLINE on failure; don't leave them in the
+ * parallel stages. This happens in the nosmt case for
+ * non-primary threads.
+ */
+ if (ret && cpuhp_hp_states[CPUHP_BP_PARALLEL_DYN].name) {
+ struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
+ if (can_rollback_cpu(st))
+ WARN_ON(cpuhp_invoke_callback_range(false, cpu, st,
+ CPUHP_OFFLINE));
+ }
+ }
}
}


Attachment: smime.p7s
Description: S/MIME cryptographic signature