Re: [PATCH 2/2] cgroup: Use separate work structs on css release path

From: Michal Koutný
Date: Wed May 25 2022 - 12:15:13 EST


On Wed, May 25, 2022 at 05:15:17PM +0200, Michal Koutný <mkoutny@xxxxxxxx> wrote:
> // ref=1: only base reference
> kill_css()
> css_get() // fuse, ref+=1 == 2
> percpu_ref_kill_and_confirm
> // ref -= 1 == 1: kill base references
> [via rcu]
> css_killed_ref_fn == refcnt.confirm_switch
> queue_work(css->destroy_work) (1)
> [via css->destroy_work]
> css_killed_work_fn == wq.func
> offline_css() // needs fuse
> css_put // ref -= 1 == 0: de-fuse, was last
> ...
> percpu_ref_put_many
> css_release
> queue_work(css->destroy_work) (2)
> [via css->destroy_work]
> css_release_work_fn == wq.func

Apologies, this is wrong explanation. (I thought this explains why
Tadeusz's patch with double get/put didn't fix it (i.e. any number
wouldn't help with the explanation above).)

But the above is not correct. I've looked at the stack trace [1] and the
offending percpu_ref_put_many is called from an RCU callback
percpu_ref_switch_to_atomic_rcu(), so I can't actually see why it drops
to zero there...

Regards,
Michal