Re: [PATCH 2/2] percpu-refcount: implement percpu_tryget() alongwith percpu_ref_kill_and_confirm()

From: Kent Overstreet
Date: Wed Jun 12 2013 - 17:46:39 EST

On Wed, Jun 12, 2013 at 02:17:47PM -0700, Tejun Heo wrote:
> Yeap, this is icky. If you have any better ideas, I'm all ears.

I'm reading through the cgroup patch/code now - this refcounting is
_hairy_ so I could certainly believe the way you've done it is the
sanest way that'll work.

But it does feel like some of the ordering/ownership is backwards here,
and that's where the need for confirm_kill is coming from - also, the
fact that css_tryget() is used more than css_get() is... suspicious.

Basically, you're wanting the lifetime of the subsystems to be
controlled by the lifetime of the cgroup. If that's the case, then
nothing should be taking refs on them (i'm not sure if that's actually
the case) and they shouldn't be taking refs on the cgroup - the cgroup
should kill them directly when its ref goes to 0.

Of course, if they can't just be killed and they really do need
independent lifetimes, then that's a problem - though embedding two
refcounts in the cgroup might solve that (one for the subsystems, one
for everything else).

Uhm, cgroup_offline_fn() seems weird. First it's telling the subsystems
to go away - but all we know is that tryget() is failing, there could
still be outstanding refs. What am I missing? offline_css() doesn't
appear to wait for any refs to hit 0.

Then it says it's putting base refs... but base refs should be put with
percpu_ref_kill(), so what refs are those really?

Then there's a list_del_rcu() - but that should probably happen _before_
the call_rcu that percpu_ref_kill() does (in the absense of the bizzare
cgroup stuff you need the list_del_rcu() or whatever to happen first, so
that you know no new gets() can happen, then then after the call_rcu()
it's safe to drop the base ref... if you drop it before the rcu barrier
you can have a get happen after the ref already hit 0.)

So maybe those lists just aren't used by anything that would take a ref?
Or is there a barrier or something somewhere else that makes it safe?

> > > -void percpu_ref_kill(struct percpu_ref *ref)
> > > +void percpu_ref_kill_and_confirm(struct percpu_ref *ref,
> > > + percpu_ref_func_t *confirm_kill)
> >
> > Passing release to percpu_ref_init() and confirm_kill to
> > percpu_ref_kill() is inconsistent. Can we pass them both to
> > percpu_ref_init()?
> I don't know. Maybe. While they're stored in the same place,
> @confirm_kill is really an optional part of killing itself, so
> specifying it to kill *seems* like the better place and it also marks
> it clearly that something funky is going on during while killing the
> reference count.

I missed the part where you kept percpu_ref_kill() as a wrapper - the
way you did it makes more sense now.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at