Re: [PATCH 1/2] perf_events: add cgroup support (v8)

From: Peter Zijlstra
Date: Wed Feb 02 2011 - 07:45:41 EST


On Wed, 2011-02-02 at 17:20 +0530, Balbir Singh wrote:
> * Peter Zijlstra <peterz@xxxxxxxxxxxxx> [2011-02-02 12:29:20]:
>
> > On Thu, 2011-01-20 at 15:39 +0100, Peter Zijlstra wrote:
> > > On Thu, 2011-01-20 at 15:30 +0200, Stephane Eranian wrote:
> > > > @@ -4259,8 +4261,20 @@ void cgroup_exit(struct task_struct *tsk, int run_callbacks)
> > > >
> > > > /* Reassign the task to the init_css_set. */
> > > > task_lock(tsk);
> > > > + /*
> > > > + * we mask interrupts to prevent:
> > > > + * - timer tick to cause event rotation which
> > > > + * could schedule back in cgroup events after
> > > > + * they were switched out by perf_cgroup_sched_out()
> > > > + *
> > > > + * - preemption which could schedule back in cgroup events
> > > > + */
> > > > + local_irq_save(flags);
> > > > + perf_cgroup_sched_out(tsk);
> > > > cg = tsk->cgroups;
> > > > tsk->cgroups = &init_css_set;
> > > > + perf_cgroup_sched_in(tsk);
> > > > + local_irq_restore(flags);
> > > > task_unlock(tsk);
> > > > if (cg)
> > > > put_css_set_taskexit(cg);
> > >
> > > So you too need a callback on cgroup change there.. Li, Paul, any chance
> > > we can fix this cgroup_subsys::exit callback? The scheduler code needs
> > > to do funny thing because its in the wrong place as well.
> >
> > cgroup guys? Shall I just fix this exit thing since the only user seems
> > to be the scheduler and now perf for both of which its unfortunate at
> > best?
>
> Are you suggesting that the cgroup_exit on task_exit notification should be
> pulled out?


No, just fixed. The callback as it exists isn't useful and leads to
hacks like the above.


> > Balbir, memcontrol.c uses pre_destroy(), I pose that using this method
> > is broken per definition since it makes the cgroup empty notification
> > void.
> >
>
> We use pre_destroy() to reclaim, so that delete/rmdir() will be able
> to clean up the node/group. I am not sure what you mean by it makes
> the empty notification void and why pre_destroy() is broken?

A quick look at the code looked like it could return -EBUSY (and other
errors), in that case the rmdir of the empty cgroup will fail.

Therefore it can happen that after the last task is removed, and we get
the notification that the cgroup is empty, and we attempt the rmdir we
will fail.

This again means that all such notification handlers must poll state,
which is ridiculous.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/