Re: [PATCH] x86/resctrl: Only show tasks' pids in current pid namespace

From: Reinette Chatre
Date: Thu Mar 16 2023 - 17:41:31 EST


Hi Shawn,

On 3/15/2023 8:06 AM, Shawn Wang wrote:
> On 2/16/23 5:43 AM, Reinette Chatre wrote:
>> On 1/15/2023 11:12 PM, Shawn Wang wrote:
>>> When writing a task id to the "tasks" file in an rdtgroup,
>>> rdtgroup_tasks_write() treats the pid as a number in the current pid
>>> namespace. But when reading the "tasks" file, rdtgroup_tasks_show() shows
>>> the list of global pids from the init namespace. If current pid namespace
>>> is not the init namespace, pids in "tasks" will be confusing and incorrect.
>>>
>>> To be more robust, let the "tasks" file only show pids in the current pid
>>> namespace.
>>>
>>
>> Is it possible to elaborate more on the use case that this is aiming to
>> address? It is unexpected to me that resource management is approached from
>> within a container. My expectation is that the resource management and monitoring
>> is done from the host.
>
> We have a scenario where we only want to mount the resctrl filesystem under a specific container.

This scenario is interesting to me. My assumption has always been that the resource
management is done from the host and not a container. Especially since a container
can only add its own tasks to resource groups.

> And We found that the pids in the tasks under resctrl are inconsistent with the pids obtained by top.

Indeed.

>
> Besides, current rdtgroup_move_task() uses the find_task_by_vpid() to get the real pid.
> Our modification is also to maintain symmetry with the rdtgroup_move_task().

I understand, thank you for looking into this.

>
>>> ---
>>>   arch/x86/kernel/cpu/resctrl/rdtgroup.c | 8 ++++++--
>>>   1 file changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> index 5993da21d822..9e97ae24c159 100644
>>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> @@ -718,11 +718,15 @@ static ssize_t rdtgroup_tasks_write(struct kernfs_open_file *of,
>>>   static void show_rdt_tasks(struct rdtgroup *r, struct seq_file *s)
>>>   {
>>>       struct task_struct *p, *t;
>>> +    pid_t pid;
>>>         rcu_read_lock();
>>>       for_each_process_thread(p, t) {
>>> -        if (is_closid_match(t, r) || is_rmid_match(t, r))
>>> -            seq_printf(s, "%d\n", t->pid);
>>> +        if (is_closid_match(t, r) || is_rmid_match(t, r)) {
>>> +            pid = task_pid_vnr(t);
>>> +            if (pid)
>>> +                seq_printf(s, "%d\n", pid);
>>> +        }
>>>       }
>>>       rcu_read_unlock();
>>>   }
>>
>> This looks like it would solve the stated problem. Does it slow down
>> reading a tasks file in a measurable way?
>
> We didn't test it, but it is proportional to the number of pids in the group.
> In addition, only an if statement is added here, and actually the reading of
> the tasks interface will not be called frequently, so it will not be a bottleneck.

It adds more than an if statement and for a default root group task_pid_vnr() will
be called for every task on the host. I am not familiar with namespaces so my concern
was the additional task_pid_vnr() call. This does seem to be the custom though and does
what's needed to return the correct data.

I did test this and can confirm that when bind mounting /sys/fs/resctrl into the container
the container's view of /sys/fs/resctrl/tasks only shows its own tasks with the pids as seen
by it. Without this patch both the container and the host shows the same data, which are the
pids from the host namespace.

Tested-by: Reinette Chatre <reinette.chatre@xxxxxxxxx>
Acked-by: Reinette Chatre <reinette.chatre@xxxxxxxxx>

When you no longer expect any more feedback I'd recommend that you resubmit this
patch with the new tags to make it easier for the next level maintainers to notice
it and pick it up. To ensure accurate references to discussions you can add a
"Link:" to this email.

Thank you very much

Reinette