Re: [PATCH v4 3/3] x86/resctrl: Implement rename op for mon groups

From: Reinette Chatre
Date: Thu Mar 23 2023 - 14:08:42 EST

Next message: Jonathan Corbet: "Re: [PATCH v3] docs: describe how to quickly build a trimmed kernel"
Previous message: Yosry Ahmed: "Re: [RFC PATCH 4/7] memcg: sleep during flushing stats in safe contexts"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Peter,

On 3/8/2023 5:14 AM, Peter Newman wrote:
> To change the class of service for a large group of tasks, such as an

how about "class of service for" -> "resources allocated to" (see later
comment)

> application container, a container manager must write all of the tasks'
> IDs into the tasks file interface of the new control group.
>
> If a container manager is tracking containers' bandwidth usage by

"is tracking" -> "is additionally"

> placing tasks from each into their own monitoring group, it must first
> move the tasks to the default monitoring group of the new control group
> before it can move the tasks into their new monitoring groups. This is
> undesirable because it makes bandwidth usage during the move
> unattributable to the correct tasks and resets monitoring event counters
> and cache usage information for the group.
>
> To address this, implement the rename operation for resctrlfs mon groups
> to effect a change in CLOSID for a MON group while otherwise leaving the
> monitoring group intact.

As I understand the main motivation of this work is to change the
resources allocated to a large number of tasks without impacting the
monitoring data. You do mention "changing the class of service" and "change
in CLOSID" but I think a simpler motivation can help to support this work.

Together with the earlier snippets, how about the final paragraph reads:

"Implement the rename operation only for resctrlfs monitor groups to enable
users to move a monitoring group from one control group to another. This
effects a change in resources allocated to all the tasks in the monitoring
group while otherwise leaving the monitoring data intact."

>
> Signed-off-by: Peter Newman <peternewman@xxxxxxxxxx>
> ---
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 127 +++++++++++++++++++++++++
> 1 file changed, 127 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 84af23a29612..6d576013fc16 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -3256,6 +3256,132 @@ static int rdtgroup_rmdir(struct kernfs_node *kn)
> return ret;
> }
>
> +static bool mongrp_reparent_match(struct task_struct *t, struct rdtgroup *from)

As a follow-up on comments in previous patch, please name the function to reflect
what the does as opposed to what the caller does when using it.

> +{
> + WARN_ON(from->type != RDTMON_GROUP);
> +
> + /* RMID match implies CLOSID match. */

I find this comment confusing considering how this function is used. The resource
group's CLOSID was changed and this function is used to find tasks that need their
CLOSID updated. Stating here that the RMID implies a CLOSID match when the
CLOSID is not expected to match in the usage is unclear to me.

> + return is_rmid_match(t, from);
> +}
> +
> +/**
> + * mongrp_reparent() - replace parent CTRL_MON group of a MON group
> + * @rdtgrp: the MON group whose parent should be replaced
> + * @new_prdtgrp: replacement parent CTRL_MON group for @rdtgrp
> + * @cpus: cpumask provided by the caller for use during this call
> + *
> + * Replaces the parent CTRL_MON group for a MON group, resulting in all member
> + * tasks' CLOSID immediately changing to that of the new parent group.
> + * Monitoring data for the group is unaffected by this operation.
> + */
> +static void mongrp_reparent(struct rdtgroup *rdtgrp,
> + struct rdtgroup *new_prdtgrp,
> + cpumask_var_t cpus)
> +{
> + struct rdtgroup *prdtgrp = rdtgrp->mon.parent;
> +
> + WARN_ON(rdtgrp->type != RDTMON_GROUP);
> + WARN_ON(new_prdtgrp->type != RDTCTRL_GROUP);
> +
> + /* Nothing to do when simply renaming a MON group. */
> + if (prdtgrp == new_prdtgrp)
> + return;
> +
> + WARN_ON(list_empty(&prdtgrp->mon.crdtgrp_list));
> + list_del(&rdtgrp->mon.crdtgrp_list);
> +
> + list_add_tail(&rdtgrp->mon.crdtgrp_list,
> + &new_prdtgrp->mon.crdtgrp_list);

Could list_move_tail() be used here?

> + rdtgrp->mon.parent = new_prdtgrp;
> + rdtgrp->closid = new_prdtgrp->closid;
> +
> + /* Propagate updated closid to all tasks in this group. */
> + rdt_move_group_tasks(rdtgrp, rdtgrp, cpus, mongrp_reparent_match);
> +
> + update_closid_rmid(cpus, NULL);
> +}
> +
> +static int rdtgroup_rename(struct kernfs_node *kn,
> + struct kernfs_node *new_parent, const char *new_name)
> +{
> + struct rdtgroup *new_prdtgrp;
> + struct rdtgroup *rdtgrp;
> + cpumask_var_t tmpmask;
> + int ret;
> +
> + /*
> + * Don't allow kernfs_to_rdtgroup() to return a parent rdtgroup if
> + * either kernfs_node is a file.
> + */
> + if (kernfs_type(kn) != KERNFS_DIR ||
> + kernfs_type(new_parent) != KERNFS_DIR)
> + return -EPERM;

This would be one scenario where the user may attempt an interaction
with resctrl that results in an error while peeking at "last_cmd_status"
will report "ok". This is not the only case in which this may happen and
I think the code is ok. To help users to not need to read the kernel code
to understand what is going on, could a snippet about this feature be added
to the "Resource alloc and monitor groups" section in
Documentation/x86/rescrl.rst. It does not have to be elaborate
but in the area where directory removal is discussed there could be
a snippet that documents this new feature.

> +
> + rdtgrp = kernfs_to_rdtgroup(kn);
> + new_prdtgrp = kernfs_to_rdtgroup(new_parent);
> + if (!rdtgrp || !new_prdtgrp)
> + return -EPERM;

Can this be ENOENT to be consistent with this error encountered in
existing resctrl interactions? Although I see that rmdir currently
returns EPERM also but it is an outlier with that usage. ENODEV may
work also to match with the mkdir return - I do not know why it
was done differently.

> +
> + if (!zalloc_cpumask_var(&tmpmask, GFP_KERNEL))
> + return -ENOMEM;

It remains strange to do the allocation here. I understand its usage so maybe
just a comment like: "Perform early allocation as part
of ensuring the later resource group move cannot fail."

> +
> + /* Release both kernfs active_refs before obtaining rdtgroup mutex. */
> + rdtgroup_kn_get(rdtgrp, kn);
> + rdtgroup_kn_get(new_prdtgrp, new_parent);
> +
> + mutex_lock(&rdtgroup_mutex);
> +
> + rdt_last_cmd_clear();
> +
> + if ((rdtgrp->flags & RDT_DELETED) || (new_prdtgrp->flags & RDT_DELETED)) {
> + ret = -ESRCH;
> + goto out;
> + }

Same here, ENOENT?

> +
> + if (rdtgrp->type != RDTMON_GROUP || !kn->parent ||
> + !is_mon_groups(kn->parent, kn->name)) {
> + rdt_last_cmd_puts("source must be a MON group\n");

Please start the message with a capital letter.

> + ret = -EPERM;
> + goto out;
> + }
> +
> + if (!is_mon_groups(new_parent, new_name)) {
> + rdt_last_cmd_puts("destination must be a mon_groups subdirectory\n");

Also here a capital letter please.

> + ret = -EPERM;
> + goto out;
> + }
> +
> + /*
> + * If the MON group is monitoring CPUs, they must be assigned to the

Could this please be specific about what is meant with "they"?

> + * current parent CTRL_MON group and therefore cannot be assigned to
> + * the new parent, making the move illegal.
> + */
> + if (!cpumask_empty(&rdtgrp->cpu_mask) &&
> + (rdtgrp->mon.parent != new_prdtgrp)) {
> + rdt_last_cmd_puts("cannot move a MON group that monitors CPUs\n");

Also here a capital letter please.

> + ret = -EPERM;
> + goto out;
> + }
> +
> + /*
> + * Perform all input validation needed to ensure mongrp_reparent() will
> + * succeed before calling kernfs_rename(), otherwise it would be
> + * necessary to revert this call if mongrp_reparent() failed.
> + */
> + ret = kernfs_rename(kn, new_parent, new_name);
> + if (ret)
> + goto out;
> +
> + mongrp_reparent(rdtgrp, new_prdtgrp, tmpmask);
> +
> +out:
> + mutex_unlock(&rdtgroup_mutex);
> + rdtgroup_kn_put(rdtgrp, kn);
> + rdtgroup_kn_put(new_prdtgrp, new_parent);
> + free_cpumask_var(tmpmask);
> + return ret;
> +}
> +
> static int rdtgroup_show_options(struct seq_file *seq, struct kernfs_root *kf)
> {
> if (resctrl_arch_get_cdp_enabled(RDT_RESOURCE_L3))
> @@ -3273,6 +3399,7 @@ static int rdtgroup_show_options(struct seq_file *seq, struct kernfs_root *kf)
> static struct kernfs_syscall_ops rdtgroup_kf_syscall_ops = {
> .mkdir = rdtgroup_mkdir,
> .rmdir = rdtgroup_rmdir,
> + .rename = rdtgroup_rename,
> .show_options = rdtgroup_show_options,
> };
>

This looks good. Thank you.

Reinette

Next message: Jonathan Corbet: "Re: [PATCH v3] docs: describe how to quickly build a trimmed kernel"
Previous message: Yosry Ahmed: "Re: [RFC PATCH 4/7] memcg: sleep during flushing stats in safe contexts"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]