Re: [PATCH v4] powerpc/pseries: Remove limit in wait for dying CPU

From: Nathan Lynch
Date: Tue Apr 30 2019 - 12:34:40 EST


Thiago Jung Bauermann <bauerman@xxxxxxxxxxxxx> writes:
> This can be a problem because if the busy loop finishes too early, then the
> kernel may offline another CPU before the previous one finished dying,
> which would lead to two concurrent calls to rtas-stop-self, which is
> prohibited by the PAPR.
>
> Since the hotplug machinery already assumes that cpu_die() is going to
> work, we can simply loop until the CPU stops.
>
> Also change the loop to wait 100 Âs between each call to
> smp_query_cpu_stopped() to avoid querying RTAS too often.

[...]

> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> index 97feb6e79f1a..d75cee60644c 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> @@ -214,13 +214,17 @@ static void pseries_cpu_die(unsigned int cpu)
> msleep(1);
> }
> } else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) {
> -
> - for (tries = 0; tries < 25; tries++) {
> + /*
> + * rtas_stop_self() panics if the CPU fails to stop and our
> + * callers already assume that we are going to succeed, so we
> + * can just loop until the CPU stops.
> + */
> + while (true) {
> cpu_status = smp_query_cpu_stopped(pcpu);
> if (cpu_status == QCSS_STOPPED ||
> cpu_status == QCSS_HARDWARE_ERROR)
> break;
> - cpu_relax();
> + udelay(100);
> }
> }

I agree with looping indefinitely but doesn't it need a cond_resched()
or similar check?