Re: [Question] set_cpus_allowed_ptr() call failed at cpuset_attach()

From: Michal Koutný
Date: Wed Jan 19 2022 - 08:02:38 EST


On Fri, Jan 14, 2022 at 09:15:06AM +0800, Zhang Qiao <zhangqiao22@xxxxxxxxxx> wrote:
> I found the following warning log on qemu. I migrated a task from one cpuset cgroup to
> another, while I also performed the cpu hotplug operation, and got following calltrace.

Do you have more information on what hotplug event and what error
(from set_cpus_allowed_ptr() you observe? (And what's src/dst cpuset wrt
root/non-root)?

> Can we use cpus_read_lock()/cpus_read_unlock() to guarantee that set_cpus_allowed_ptr()
> doesn't fail, as follows:

I'm wondering what can be wrong with the current actors:

cpuset_can_attach
down_read(cpuset_rwsem)
// check all migratees
up_read(cpuset_rwsem)
[ _cpu_down / cpuhp_setup_state ]
schedule_work
...
cpuset_hotplug_update_tasks
down_write(cpuset_rwsem)
up_write(cpuset_rwsem)
... flush_work
[ _cpu_down / cpu_up_down_serialize_trainwrecks ]
cpuset_attach
down_write(cpuset_rwsem)
set_cpus_allowed_ptr(allowed_cpus_weird)
up_write(cpuset_rwsem)

The statement in cpuset_attach() about cpuset_can_attach() test is not
so strong since task_can_attach() is mostly a pass for non-deadline
tasks. Still, the use of cpuset_rwsem above should synchronize (I may be
mistaken) the changes of cpuset's cpu masks, so I'd be interested about
the details above to understand why the current approach doesn't work.

The additional cpus_read_{,un}lock (when reordered wrt cpuset_rwsem)
may work but your patch should explain why (in what situation).

My .02€,
Michal