Re: [PATCH v3 3/3] x86/resctrl: Implement rename op for mon groups

From: Reinette Chatre
Date: Fri Feb 10 2023 - 18:21:53 EST


Hi Peter,

On 1/25/2023 2:13 AM, Peter Newman wrote:
> To change the class of service for a large group of tasks, such as an
> application container, a container manager must write all of the tasks'
> IDs into the tasks file interface of the new control group.
>
> If a container manager is tracking containers' bandwidth usage by
> placing tasks from each into their own monitoring group, it must first
> move the tasks to the default monitoring group of the new control group
> before it can move the tasks into their new monitoring groups. This is
> undesirable because it makes bandwidth usage during the move
> unattributable to the correct tasks and resets monitoring event counters
> and cache usage information for the group.
>
> To address this, implement the rename operation for resctrlfs mon groups
> to effect a change in CLOSID for a MON group while otherwise leaving the
> monitoring group intact.
>
> Signed-off-by: Peter Newman <peternewman@xxxxxxxxxx>
> ---
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 75 ++++++++++++++++++++++++++
> 1 file changed, 75 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index b2081bc1bbfd..595f83a517c6 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -3238,6 +3238,80 @@ static int rdtgroup_rmdir(struct kernfs_node *kn)
> return ret;
> }
>
> +static void mongrp_move(struct rdtgroup *rdtgrp, struct rdtgroup *new_prdtgrp,
> + cpumask_var_t cpus)

Could you please add some function comments on what is moved and how it is accomplished?

> +{
> + struct rdtgroup *prdtgrp = rdtgrp->mon.parent;
> + struct task_struct *p, *t;
> +
> + WARN_ON(list_empty(&prdtgrp->mon.crdtgrp_list));
> + list_del(&rdtgrp->mon.crdtgrp_list);
> +
> + list_add_tail(&rdtgrp->mon.crdtgrp_list,
> + &new_prdtgrp->mon.crdtgrp_list);
> + rdtgrp->mon.parent = new_prdtgrp;
> +
> + read_lock(&tasklist_lock);
> + for_each_process_thread(p, t) {
> + if (is_closid_match(t, prdtgrp) && is_rmid_match(t, rdtgrp))
> + rdt_move_one_task(t, new_prdtgrp->closid, t->rmid,
> + cpus);
> + }
> + read_unlock(&tasklist_lock);

Can rdt_move_group_tasks() be used here?

> +
> + update_closid_rmid(cpus, NULL);
> +}

I see the tasks in a monitor group handled. There is another way to create/manage
a monitor group. For example, by not writing tasks to the tasks file but instead
to write CPU ids to the CPU file. All tasks on a particular CPU will be monitored
by that group. One rule is that a MON group may only have CPUs that are owned by
the CTRL_MON group.
It is not clear to me how such a group is handled in this work.


> +
> +static int rdtgroup_rename(struct kernfs_node *kn,
> + struct kernfs_node *new_parent, const char *new_name)
> +{
> + struct rdtgroup *new_prdtgrp;
> + struct rdtgroup *rdtgrp;
> + cpumask_var_t tmpmask;
> + int ret;
> +
> + if (!zalloc_cpumask_var(&tmpmask, GFP_KERNEL))
> + return -ENOMEM;
> +
> + rdtgrp = kernfs_to_rdtgroup(kn);
> + new_prdtgrp = kernfs_to_rdtgroup(new_parent);
> + if (!rdtgrp || !new_prdtgrp) {
> + free_cpumask_var(tmpmask);
> + return -EPERM;
> + }
> +

How robust is this against user space attempting to move files?

> + /* Release both kernfs active_refs before obtaining rdtgroup mutex. */
> + rdtgroup_kn_get(rdtgrp, kn);
> + rdtgroup_kn_get(new_prdtgrp, new_parent);
> +
> + mutex_lock(&rdtgroup_mutex);
> +
> + if ((rdtgrp->flags & RDT_DELETED) || (new_prdtgrp->flags & RDT_DELETED)) {
> + ret = -ESRCH;
> + goto out;
> + }
> +
> + /* Only a mon group can be moved to a new mon_groups directory. */
> + if (rdtgrp->type != RDTMON_GROUP ||
> + !is_mon_groups(new_parent, kn->name)) {
> + ret = -EPERM;
> + goto out;
> + }
> +

Should in-place moves be allowed?

> + ret = kernfs_rename(kn, new_parent, new_name);
> + if (ret)
> + goto out;
> +
> + mongrp_move(rdtgrp, new_prdtgrp, tmpmask);
> +

Can tmpmask allocation/free be done in mongrp_move()?

> +out:
> + mutex_unlock(&rdtgroup_mutex);
> + rdtgroup_kn_put(rdtgrp, kn);
> + rdtgroup_kn_put(new_prdtgrp, new_parent);
> + free_cpumask_var(tmpmask);
> + return ret;
> +}
> +
> static int rdtgroup_show_options(struct seq_file *seq, struct kernfs_root *kf)
> {
> if (resctrl_arch_get_cdp_enabled(RDT_RESOURCE_L3))
> @@ -3255,6 +3329,7 @@ static int rdtgroup_show_options(struct seq_file *seq, struct kernfs_root *kf)
> static struct kernfs_syscall_ops rdtgroup_kf_syscall_ops = {
> .mkdir = rdtgroup_mkdir,
> .rmdir = rdtgroup_rmdir,
> + .rename = rdtgroup_rename,
> .show_options = rdtgroup_show_options,
> };
>

Reinette