Re: [PATCH 2/4] cpu/hotplug: CPUHP_BRINGUP_CPU exception in fail injection

From: Peter Zijlstra
Date: Wed Jan 20 2021 - 09:23:11 EST


On Mon, Jan 11, 2021 at 05:10:45PM +0000, vincent.donnefort@xxxxxxx wrote:
> From: Vincent Donnefort <vincent.donnefort@xxxxxxx>
>
> The atomic states (between CPUHP_AP_IDLE_DEAD and CPUHP_AP_ONLINE) are
> triggered by the CPUHP_BRINGUP_CPU step. If the latter doesn't run, none
> of the atomic can. Hence, rollback is not possible after a hotunplug
> CPUHP_BRINGUP_CPU step failure and the "fail" interface shouldn't allow
> it. Moreover, the current CPUHP_BRINGUP_CPU teardown callback
> (finish_cpu()) cannot fail anyway.
>
> Signed-off-by: Vincent Donnefort <vincent.donnefort@xxxxxxx>
> ---
> kernel/cpu.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 9121edf..bcd7b2a 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -2216,9 +2216,14 @@ static ssize_t write_cpuhp_fail(struct device *dev,
> return -EINVAL;
>
> /*
> - * Cannot fail STARTING/DYING callbacks.
> + * Cannot fail STARTING/DYING callbacks. Also, those callbacks are
> + * triggered by BRINGUP_CPU bringup callback. Therefore, the latter
> + * can't fail during hotunplug, as it would mean we have no way of
> + * rolling back the atomic states that have been previously teared
> + * down.
> */
> - if (cpuhp_is_atomic_state(fail))
> + if (cpuhp_is_atomic_state(fail) ||
> + (fail == CPUHP_BRINGUP_CPU && st->state > CPUHP_BRINGUP_CPU))
> return -EINVAL;

Should we instead disallow failing any state that has .cant_stop ?