Re: [RFC][PATCH] cpu/hotplug: wait for cpuset_hotplug_work to finish on cpu online

From: Peter Zijlstra
Date: Mon Dec 07 2020 - 03:39:46 EST


On Thu, Dec 03, 2020 at 05:14:31PM +0000, Alexey Klimov wrote:
> When a CPU offlined and onlined via device_offline() and device_online()
> the userspace gets uevent notification. If, after receiving uevent,
> userspace executes sched_setaffinity() on some task trying to move it
> to a recently onlined CPU, then it will fail with -EINVAL. Userspace needs
> to wait around 5..30 ms before sched_setaffinity() will succeed for
> recently onlined CPU after receiving uevent.

Right.

> Unfortunately, the execution time of
> echo 1 > /sys/devices/system/cpu/cpuX/online roughly doubled with this
> change (on my test machine).

Nobody cares, it's hotplug, it's supposed to be slow :-) That is,
we fundamentally shift the work _to_ the hotplug path, so as to keep
everybody else fast.

> The nature of this bug is also described here (with different consequences):
> https://lore.kernel.org/lkml/20200211141554.24181-1-qais.yousef@xxxxxxx/

Yeah, pesky deadlocks.. someone was going to try again.


> kernel/cpu.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 6ff2578ecf17..f39a27a7f24b 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -15,6 +15,7 @@
> #include <linux/sched/smt.h>
> #include <linux/unistd.h>
> #include <linux/cpu.h>
> +#include <linux/cpuset.h>
> #include <linux/oom.h>
> #include <linux/rcupdate.h>
> #include <linux/export.h>
> @@ -1275,6 +1276,8 @@ static int cpu_up(unsigned int cpu, enum cpuhp_state target)
> }
>
> err = _cpu_up(cpu, 0, target);
> + if (!err)
> + cpuset_wait_for_hotplug();
> out:
> cpu_maps_update_done();
> return err;

My only consideration is if doing that flush under cpu_add_remove_lock()
is wise.