Re: [PATCH v16 2/8] cpu/hotplug: Reset task stack state in _cpu_up()

From: Mark Rutland
Date: Wed Mar 22 2023 - 07:06:34 EST


On Tue, Mar 21, 2023 at 07:40:02PM +0000, Usama Arif wrote:
> From: David Woodhouse <dwmw@xxxxxxxxxxxx>
>
> Commit dce1ca0525bf ("sched/scs: Reset task stack state in bringup_cpu()")
> ensured that the shadow call stack and KASAN poisoning were removed from
> a CPU's stack each time that CPU is brought up, not just once.
>
> This is not incorrect. However, with parallel bringup, an architecture
> may obtain the idle thread for a new CPU from a pre-bringup stage, by
> calling idle_thread_get() for itself. This would mean that the cleanup
> in bringup_cpu() would be too late.
>
> Move the SCS/KASAN cleanup to the generic _cpu_up() function instead,
> which already ensures that the new CPU's stack is available, purely to
> allow for early failure. This occurs when the CPU to be brought up is
> in the CPUHP_OFFLINE state, which should correctly do the cleanup any
> time the CPU has been taken down to the point where such is needed.
>
> Signed-off-by: David Woodhouse <dwmw@xxxxxxxxxxxx>

This all sounds fine to me, and the patch itself looks good.

I built an arm64 kernel with the first three patches from this series applied
atop v6.3-rc3, with defconfig + CONFIG_SHADOW_CALL_STACK=y +
CONFIG_KASAN_INLINE=y + CONFIG_KASAN_STACK=y. I then hotplugged a cpu with:

while true; do
echo 0 > /sys/devices/system/cpu/cpu1/online;
echo 1 > /sys/devices/system/cpu/cpu1/online;
done

... and that was perfectly happy to run for minutes with no unexpected failures.

To make sure I wasn't avoiding issues by chance, I also tried with each of
scs_task_reset() and kasan_unpoison_task_stack() commented out. With
scs_task_reset() commented out, cpu re-onlining fails after a few iterations,
and with kasan_unpoison_task_stack() commented out I get a KASAN splat upon the
first re-onlining. So that all looks good.

FWIW, for the first three patches:

Reviewed-by: Mark Rutland <mark.rutland@xxxxxxx>
Tested-by: Mark Rutland <mark.rutland@xxxxxxx> [arm64]

Mark.

> ---
> kernel/cpu.c | 12 ++++++------
> 1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 6c0a92ca6bb5..43e0a77f21e8 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -591,12 +591,6 @@ static int bringup_cpu(unsigned int cpu)
> struct task_struct *idle = idle_thread_get(cpu);
> int ret;
>
> - /*
> - * Reset stale stack state from the last time this CPU was online.
> - */
> - scs_task_reset(idle);
> - kasan_unpoison_task_stack(idle);
> -
> /*
> * Some architectures have to walk the irq descriptors to
> * setup the vector space for the cpu which comes online.
> @@ -1383,6 +1377,12 @@ static int _cpu_up(unsigned int cpu, int tasks_frozen, enum cpuhp_state target)
> ret = PTR_ERR(idle);
> goto out;
> }
> +
> + /*
> + * Reset stale stack state from the last time this CPU was online.
> + */
> + scs_task_reset(idle);
> + kasan_unpoison_task_stack(idle);
> }
>
> cpuhp_tasks_frozen = tasks_frozen;
> --
> 2.25.1
>