Re: [PATCH v2 -next] cgroup: remove offline draining in root destruction to avoid hung_tasks
From: Hillf Danton
Date: Thu Aug 14 2025 - 22:46:20 EST
On Fri, Jul 25, 2025 at 09:42:05AM +0800, Chen Ridong <chenridong@xxxxxxxxxxxxxxx> wrote:
> > On Tue, Jul 22, 2025 at 11:27:33AM +0000, Chen Ridong <chenridong@xxxxxxxxxxxxxxx> wrote:
> >> CPU0 CPU1
> >> mount perf_event umount net_prio
> >> cgroup1_get_tree cgroup_kill_sb
> >> rebind_subsystems // root destruction enqueues
> >> // cgroup_destroy_wq
> >> // kill all perf_event css
> >> // one perf_event css A is dying
> >> // css A offline enqueues cgroup_destroy_wq
> >> // root destruction will be executed first
> >> css_free_rwork_fn
> >> cgroup_destroy_root
> >> cgroup_lock_and_drain_offline
> >> // some perf descendants are dying
> >> // cgroup_destroy_wq max_active = 1
> >> // waiting for css A to die
> >>
> >> Problem scenario:
> >> 1. CPU0 mounts perf_event (rebind_subsystems)
> >> 2. CPU1 unmounts net_prio (cgroup_kill_sb), queuing root destruction work
> >> 3. A dying perf_event CSS gets queued for offline after root destruction
> >> 4. Root destruction waits for offline completion, but offline work is
> >> blocked behind root destruction in cgroup_destroy_wq (max_active=1)
> >
> > What's concerning me is why umount of net_prio hierarhy waits for
> > draining of the default hierachy? (Where you then run into conflict with
> > perf_event that's implicit_on_dfl.)
> >
/*
* cgroup destruction makes heavy use of work items and there can be a lot
* of concurrent destructions. Use a separate workqueue so that cgroup
* destruction work items don't end up filling up max_active of system_wq
* which may lead to deadlock.
*/
If task hung could be reliably reproduced, it is right time to cut
max_active off for cgroup_destroy_wq according to its comment.