Re: [PATCH 08/12] x86/cqm: Add support for monitoring task and cgroup together

From: Thomas Gleixner
Date: Tue Jan 17 2017 - 13:14:38 EST


On Fri, 6 Jan 2017, Vikas Shivappa wrote:

> From: Vikas Shivappa<vikas.shivappa@xxxxxxxxx>
>
> This patch adds support to monitor a cgroup x and a task p1
> when p1 is part of cgroup x. Since we cannot write two RMIDs during
> sched in the driver handles this.

Again you explain WHAT not WHY....

> This patch introduces a u32 *rmid in the task_struck which keeps track
> of the RMIDs associated with the task. There is also a list in the
> arch_info of perf_cgroup called taskmon_list which keeps track of tasks
> in the cgroup that are monitored.
>
> The taskmon_list is modified in 2 scenarios.
> - at event_init of task p1 which is part of a cgroup, add the p1 to the
> cgroup->tskmon_list. At event_destroy delete the task from the list.
> - at the time of task move from cgrp x to a cgp y, if task was monitored
> remove the task from the cgrp x tskmon_list and add it the cgrp y
> tskmon_list.
>
> sched in: When the task p1 is scheduled in, we write the task RMID in
> the PQR_ASSOC MSR

Great information.

> read(for task p1): As any other cqm task event
>
> read(for the cgroup x): When counting for cgroup, the taskmon list is
> traversed and the corresponding RMID counts are added.
>
> Tests: Monitoring a cgroup x and a task with in the cgroup x should
> work.

Emphasis on should.

> +static inline int add_cgrp_tskmon_entry(u32 *rmid, struct list_head *l)
> +{
> + struct tsk_rmid_entry *entry;
> +
> + entry = kzalloc(sizeof(struct tsk_rmid_entry), GFP_KERNEL);
> + if (!entry)
> + return -ENOMEM;
> +
> + INIT_LIST_HEAD(&entry->list);
> + entry->rmid = rmid;
> +
> + list_add_tail(&entry->list, l);
> +
> + return 0;

And this function has a return value because the return value is never
evaluated at any call site.

> +}
> +
> +static inline void del_cgrp_tskmon_entry(u32 *rmid, struct list_head *l)
> +{
> + struct tsk_rmid_entry *entry = NULL, *tmp1;

And where is *tmp2? What is wrong with simply *tmp?

> +
> + list_for_each_entry_safe(entry, tmp1, l, list) {
> + if (entry->rmid == rmid) {
> +
> + list_del(&entry->list);
> + kfree(entry);
> + break;
> + }
> + }
> +}
> +
> #ifdef CONFIG_CGROUP_PERF
> struct cgrp_cqm_info *cqminfo_from_tsk(struct task_struct *tsk)
> {
> @@ -380,6 +410,49 @@ struct cgrp_cqm_info *cqminfo_from_tsk(struct task_struct *tsk)
> }
> #endif
>
> +static inline void
> + cgrp_tskmon_update(struct task_struct *tsk, u32 *rmid, bool ena)

Sigh

> +{
> + struct cgrp_cqm_info *ccinfo = NULL;
> +
> +#ifdef CONFIG_CGROUP_PERF
> + ccinfo = cqminfo_from_tsk(tsk);
> +#endif
> + if (!ccinfo)
> + return;
> +
> + if (ena)
> + add_cgrp_tskmon_entry(rmid, &ccinfo->tskmon_rlist);
> + else
> + del_cgrp_tskmon_entry(rmid, &ccinfo->tskmon_rlist);
> +}
> +
> +static int cqm_assign_task_rmid(struct perf_event *event, u32 *rmid)
> +{
> + struct task_struct *tsk;
> + int ret = 0;
> +
> + rcu_read_lock();
> + tsk = event->hw.target;
> + if (pid_alive(tsk)) {
> + get_task_struct(tsk);

This works because after the pid_alive() check the task cannot be
released before issuing get_task_struct(), right?

That's voodoo protection. How would a non alive task end up as event
target?

> +
> + if (rmid != NULL)
> + cgrp_tskmon_update(tsk, rmid, true);
> + else
> + cgrp_tskmon_update(tsk, tsk->rmid, false);
> +
> + tsk->rmid = rmid;
> +
> + put_task_struct(tsk);
> + } else {
> + ret = -EINVAL;
> + }
> + rcu_read_unlock();
> +
> + return ret;
> +}
> +
> static inline void cqm_enable_mon(struct cgrp_cqm_info *cqm_info, u32 *rmid)
> {
> if (rmid != NULL) {
> @@ -429,8 +502,12 @@ static void cqm_assign_hier_rmid(struct cgroup_subsys_state *rcss, u32 *rmid)
>
> static int cqm_assign_rmid(struct perf_event *event, u32 *rmid)
> {
> + if (is_task_event(event)) {
> + if (cqm_assign_task_rmid(event, rmid))
> + return -EINVAL;
> + }
> #ifdef CONFIG_CGROUP_PERF
> - if (is_cgroup_event(event)) {
> + else if (is_cgroup_event(event)) {
> cqm_assign_hier_rmid(&event->cgrp->css, rmid);
> }

So you keep adding stuff to cqm_assign_rmid() which handles enable and
disable. But the only call site is in cqm_event_free_rmid() which calls
that function with rmid = NULL, i.e. disable.

Can you finally explain how this is supposed to work and how all of this
has been tested and validated?

If you had used the already known 'Tests: Same as before' line to the
changelog, then we would have known that it's broken as before w/o looking
at the patch.

So the new variant of 'broken' is: Bla should work ....

Thanks,

tglx