Re: [PATCH] cgroup: don't queue css_release_work if one already pending

From: Tadeusz Struk
Date: Fri May 20 2022 - 12:38:27 EST


On 5/20/22 01:13, Tejun Heo wrote:
On Thu, May 19, 2022 at 04:26:51PM -0700, Tadeusz Struk wrote:
On 5/19/22 04:23, Hillf Danton wrote:
On Wed, 18 May 2022 09:48:21 -0700 Tadeusz Struk wrote:
On 4/22/22 04:05, Michal Koutny wrote:
On Thu, Apr 21, 2022 at 02:00:56PM -1000, Tejun Heo<tj@xxxxxxxxxx> wrote:
If this is the case, we need to hold an extra reference to be put by the
css_killed_work_fn(), right?
That put could trigger INIT_WORK in css_release() and warning [1]
on init active (active state 0) object OTOH as the same
css->destroy_work is used in both kill and release paths.

Hmm... wouldn't the extra reference keep release from happening?

Will this help if there would be two WQs, one for the css_release path
and one for the rcu_work?

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index adb820e98f24..a4873b33e488 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -124,6 +124,7 @@ DEFINE_PERCPU_RWSEM(cgroup_threadgroup_rwsem);
* which may lead to deadlock.
*/
static struct workqueue_struct *cgroup_destroy_wq;
+static struct workqueue_struct *cgroup_destroy_rcu_wq;

I don't understand why this would help. Care to elaborate?

I think it will help to solve the list corruption issue:

list_add corruption. prev->next should be next (ffff8881f705c060), but was ffff888113123870. (prev=ffff888113123870).
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:28!

as this is a result of enqueuing the same css->destroy_work onto the same WQ,
one on the rcu path and one on the css_release path.
I will prototype it today and test with syzbot.

--
Thanks,
Tadeusz