Re: [PATCH] cgroup: don't queue css_release_work if one already pending

From: Tadeusz Struk
Date: Wed Apr 13 2022 - 11:39:17 EST


Hi Hillf,
On 4/13/22 02:56, Hillf Danton wrote:
On Tue, 12 Apr 2022 12:24:59 -0700 Tadeusz Struk wrote:
Syzbot found a corrupted list bug scenario that can be triggered from
cgroup css_create(). The reproduces writes to cgroup.subtree_control
file, which invokes cgroup_apply_control_enable(), css_create(), and
css_populate_dir(), which then randomly fails with a fault injected -ENOMEM.
In such scenario the css_create() error path rcu enqueues css_free_rwork_fn
work for an css->refcnt initialized with css_release() destructor,
and there is a chance that the css_release() function will be invoked

Could you tip-point the percpu_ref_kill needed to trigger the css_release()?

What I think happens is that the write triggers:
cgroup_subtree_control_write()->cgroup_apply_control()->cgroup_apply_control_enable()->css_create()

which, allocates and initializes the css, then fails in cgroup_idr_alloc(), bails out and calls queue_rcu_work(cgroup_destroy_wq, &css->destroy_rwork);

then cgroup_subtree_control_write bails out to out_unlock, which then goes:

cgroup_kn_unlock()->cgroup_put()->css_put()->percpu_ref_put(&css->refcnt)->percpu_ref_put_many(ref)
which then calls ref->data->release(ref); and tries to enqueue the same causing list corruption in insert_work.


for a cgroup_subsys_state, for which a destroy_work has already been
queued via css_create() error path. This causes a list_add corruption
as can be seen in the syzkaller report [1].
This can be avoided by adding a check to css_release() that checks
if it has already been enqueued.

[1] https://syzkaller.appspot.com/bug?id=e26e54d6eac9d9fb50b221ec3e4627b327465dbd

Given my failure of finding the lore URL to what was
Reported-by: syzbot+e42ae441c3b10acf9e9d@xxxxxxxxxxxxxxxxxxxxxxxxx
such a link is welcome to see the list corruption.


syzkaller doesn't send reports to any kernel mailing lists that are rachived in lore. The relevant links can be found in the dashboard, link [1] in the patch.

https://syzkaller.appspot.com/bug?id=e26e54d6eac9d9fb50b221ec3e4627b327465dbd

See the links in:
Crash: BUG: corrupted list in insert_work (log)

crash: https://syzkaller.appspot.com/text?tag=CrashReport&x=11bd5e9b700000
log: https://syzkaller.appspot.com/text?tag=CrashLog&x=16bd5e9b700000


Cc: Tejun Heo <tj@xxxxxxxxxx>
Cc: Zefan Li <lizefan.x@xxxxxxxxxxxxx>
Cc: Johannes Weiner <hannes@xxxxxxxxxxx>
Cc: Christian Brauner <brauner@xxxxxxxxxx>
Cc: Alexei Starovoitov <ast@xxxxxxxxxx>
Cc: Daniel Borkmann <daniel@xxxxxxxxxxxxx>
Cc: Andrii Nakryiko <andrii@xxxxxxxxxx>
Cc: Martin KaFai Lau <kafai@xxxxxx>
Cc: Song Liu <songliubraving@xxxxxx>
Cc: Yonghong Song <yhs@xxxxxx>
Cc: John Fastabend <john.fastabend@xxxxxxxxx>
Cc: KP Singh <kpsingh@xxxxxxxxxx>
Cc: <cgroups@xxxxxxxxxxxxxxx>
Cc: <netdev@xxxxxxxxxxxxxxx>
Cc: <bpf@xxxxxxxxxxxxxxx>
Cc: <stable@xxxxxxxxxxxxxxx>
Cc: <linux-kernel@xxxxxxxxxxxxxxx>

Reported-by: syzbot+e42ae441c3b10acf9e9d@xxxxxxxxxxxxxxxxxxxxxxxxx
Fixes: 8f36aaec9c92 ("cgroup: Use rcu_work instead of explicit rcu and work item")
Signed-off-by: Tadeusz Struk <tadeusz.struk@xxxxxxxxxx>
---
kernel/cgroup/cgroup.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index adb820e98f24..9ae2de29f8c9 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -5210,8 +5210,11 @@ static void css_release(struct percpu_ref *ref)
struct cgroup_subsys_state *css =
container_of(ref, struct cgroup_subsys_state, refcnt);
- INIT_WORK(&css->destroy_work, css_release_work_fn);
- queue_work(cgroup_destroy_wq, &css->destroy_work);
+ if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT,
+ work_data_bits(&css->destroy_work))) {
+ INIT_WORK(&css->destroy_work, css_release_work_fn);
+ queue_work(cgroup_destroy_wq, &css->destroy_work);
+ }
}
static void init_and_link_css(struct cgroup_subsys_state *css,
--
2.35.1


--
Thanks,
Tadeusz