(cc'ing Waiman)
On Mon, Jul 11, 2022 at 06:46:29PM +0100, Qais Yousef wrote:
Have you tried running with PROVE_LOCKDEP enabled? It'll help print a usefulI don't think lockdep would be able to track CPU1 -> CPU2 dependency here
output about the DEADLOCK. But your explanation was good and clear to me.
unfortunately.
AFAIU:Well, the only thing I can think of is always grabbing cpus_read_lock()
CPU0 CPU1 CPU2
// attach task to a different
// cpuset cgroup via sysfs
__acquire(cgroup_threadgroup_rwsem)
// pring up CPU2 online
__acquire(cpu_hotplug_lock)
// wait for CPU2 to come online
// bringup cpu online
// call cpufreq_online() which tries to create sugov kthread
__acquire(cpu_hotplug_lock) copy_process()
cgroup_can_fork()
cgroup_css_set_fork()
__acquire(cgroup_threadgroup_rwsem)
// blocks forever // blocks forever // blocks forever
Is this a correct summary of the problem?
The locks are held in reverse order and we end up with a DEADLOCK.
I believe the same happens on offline it's just the path to hold the
cgroup_threadgroup_rwsem on CPU2 is different.
This will be a tricky one. Your proposed patch might fix it for this case, but
if there's anything else that creates a kthread when a cpu goes online/offline
then we'll hit the same problem again.
I haven't reviewed your patch to be honest, but I think worth seeing first if
there's something that can be done at the 'right level' first.
Needs head scratching from my side at least. This is the not the first type of
locking issue between hotplug and cpuset :-/
before grabbing threadgroup_rwsem. Waiman, what do you think?
Thanks.