Re: [PATCH v3 1/7] x86/resctrl: Add multiple tasks to the resctrl group at once

From: Reinette Chatre
Date: Thu Mar 16 2023 - 16:33:45 EST


Hi Babu,

On 3/16/2023 12:51 PM, Moger, Babu wrote:
> On 3/16/23 12:12, Reinette Chatre wrote:
>> On 3/16/2023 9:27 AM, Moger, Babu wrote:
>>>> -----Original Message-----
>>>> From: Reinette Chatre <reinette.chatre@xxxxxxxxx>
>>>> Sent: Wednesday, March 15, 2023 1:33 PM
>>>> To: Moger, Babu <Babu.Moger@xxxxxxx>; corbet@xxxxxxx;
>>>> tglx@xxxxxxxxxxxxx; mingo@xxxxxxxxxx; bp@xxxxxxxxx
>>>> Cc: fenghua.yu@xxxxxxxxx; dave.hansen@xxxxxxxxxxxxxxx; x86@xxxxxxxxxx;
>>>> hpa@xxxxxxxxx; paulmck@xxxxxxxxxx; akpm@xxxxxxxxxxxxxxxxxxxx;
>>>> quic_neeraju@xxxxxxxxxxx; rdunlap@xxxxxxxxxxxxx;
>>>> damien.lemoal@xxxxxxxxxxxxxxxxxx; songmuchun@xxxxxxxxxxxxx;
>>>> peterz@xxxxxxxxxxxxx; jpoimboe@xxxxxxxxxx; pbonzini@xxxxxxxxxx;
>>>> chang.seok.bae@xxxxxxxxx; pawan.kumar.gupta@xxxxxxxxxxxxxxx;
>>>> jmattson@xxxxxxxxxx; daniel.sneddon@xxxxxxxxxxxxxxx; Das1, Sandipan
>>>> <Sandipan.Das@xxxxxxx>; tony.luck@xxxxxxxxx; james.morse@xxxxxxx;
>>>> linux-doc@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
>>>> bagasdotme@xxxxxxxxx; eranian@xxxxxxxxxx; christophe.leroy@xxxxxxxxxx;
>>>> jarkko@xxxxxxxxxx; adrian.hunter@xxxxxxxxx; quic_jiles@xxxxxxxxxxx;
>>>> peternewman@xxxxxxxxxx
>>>> Subject: Re: [PATCH v3 1/7] x86/resctrl: Add multiple tasks to the resctrl group
>>>> at once
>>>>
>>>> Hi Babu,
>>>>
>>>> On 3/2/2023 12:24 PM, Babu Moger wrote:
>>>>> The resctrl task assignment for MONITOR or CONTROL group needs to be
>>>>> done one at a time. For example:
>>>>>
>>>>> $mount -t resctrl resctrl /sys/fs/resctrl/
>>>>> $mkdir /sys/fs/resctrl/clos1
>>>>> $echo 123 > /sys/fs/resctrl/clos1/tasks
>>>>> $echo 456 > /sys/fs/resctrl/clos1/tasks
>>>>> $echo 789 > /sys/fs/resctrl/clos1/tasks
>>>>>
>>>>> This is not user-friendly when dealing with hundreds of tasks. Also,
>>>>> there is a syscall overhead for each command executed from user space.
>>>>
>>>> To support this change it may also be helpful to add that moving tasks take the
>>>> mutex so attempting to move tasks in parallel will not achieve a significant
>>>> performance gain.
>>>
>>> Agree. It may not be significant performance gain. Will remove this line.
>>
>> It does not sound as though you are actually responding to my comment.
>
> I am confused. I am already saying there is syscall overhead for each
> command if we move the tasks one by one. Now do you want me to add "moving
> tasks take the mutex so attempting to move tasks in parallel will not
> achieve a significant performance gain".
>
> It is contradictory, So, I wanted to remove the line about performance.
> Did I still miss something?

Where is the contradiction?

Consider your example:
$echo 123 > /sys/fs/resctrl/clos1/tasks
$echo 456 > /sys/fs/resctrl/clos1/tasks
$echo 789 > /sys/fs/resctrl/clos1/tasks

Yes, there is syscall overhead for each of the above lines. My statement was in
support of this work by stating that a user aiming to improve performance by
attempting the above in parallel would not be able to see achieve significant
performance gain since the calls would end up being serialized.

You are providing two motivations (a) "user-friendly when dealing with
hundreds of tasks", and (b) syscall overhead. Have you measured the
improvement this solution provides?

>>>>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>>> @@ -683,16 +683,34 @@ static ssize_t rdtgroup_tasks_write(struct
>>>> kernfs_open_file *of,
>>>>> char *buf, size_t nbytes, loff_t off) {
>>>>> struct rdtgroup *rdtgrp;
>>>>> + char *pid_str;
>>>>> int ret = 0;
>>>>> pid_t pid;
>>>>>
>>>>> - if (kstrtoint(strstrip(buf), 0, &pid) || pid < 0)
>>>>> + /* Valid input requires a trailing newline */
>>>>> + if (nbytes == 0 || buf[nbytes - 1] != '\n')
>>>>> return -EINVAL;
>>>>
>>>> The resctrl files should be seen as user space API. With the above change you
>>>> take an interface that did not require a newline and dictate that it should have
>>>> a trailing newline. How convinced are you that this does not break any current
>>>> user space scripts or applications? Why does this feature require a trailing
>>>> newline?
>>>
>>> I have tested these changes with intel_cmt_cat tool. It didn’t have any problems.
>>> We are already doing newline check for few other inputs.
>>
>> You tested this with the _one_ user space tool that you use. This is not sufficient
>> to be convincing that this change has no impact. I do not believe that it is a valid
>> argument that other inputs do a newline check. This input never required a newline
>> check and it is not clear why this change now requires it. It seems that this is an
>> unnecessary new requirement that runs the risk of breaking an existing application.
>>
>> I would like to ask again: How convinced are you that this does not break _any_ current
>> user space scripts or applications? Why does this feature require a trailing
>> newline?
>
> I do not know of any other tool using resctrl fs.
> So, you want me to drop the newline requirement for this. I can try that.
> Will let you know how it goes.

You continue to avoid my question about why this requires a newline. Until
I learn why this is required, yes, from what I can tell based on this patch
this requirement can and should be dropped.

>>>>> +
>>>>> + buf[nbytes - 1] = '\0';
>>>>> +
>>>>> rdtgrp = rdtgroup_kn_lock_live(of->kn);
>>>>> if (!rdtgrp) {
>>>>> rdtgroup_kn_unlock(of->kn);
>>>>> return -ENOENT;
>>>>> }
>>>>> +
>>>>> +next:
>>>>> + if (!buf || buf[0] == '\0')
>>>>> + goto unlock;
>>>>> +
>>>>> + pid_str = strim(strsep(&buf, ","));
>>>>> +
>>>>
>>>> Could lib/cmdline.c:get_option() be useful?
>>>
>>> Yes. We could that also. May not be required for the simple case like this.
>>
>> Please keep an eye out for how much of it you end up duplicating ....
>
> Using the get_options will require at least two calls(one to get the
> length and then read the integers). Also need to allocate the integers
> array dynamically. That is lot code if we are going that route.
>

I did not ask about get_options(), I asked about get_option().

>>
>>>>> + ret = -EINVAL;
>>>>> + goto unlock;
>>>>> + }
>>>>> +
>>>>> rdt_last_cmd_clear();
>>>>>
>>>>> if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED || @@ -703,6
>>>> +721,10 @@
>>>>> static ssize_t rdtgroup_tasks_write(struct kernfs_open_file *of,
>>>>> }
>>>>>
>>>>> ret = rdtgroup_move_task(pid, rdtgrp, of);
>>>>> + if (ret)
>>>>> + goto unlock;
>>>>> + else
>>>>> + goto next;
>>>>>
>>>>
>>>> The documentation states "The failure details will be logged in
>>>> resctrl/info/last_cmd_status file." but I do not see how this is happening here.
>>>> From what I can tell this implementation does not do anything beyond what
>>>> last_cmd_status already does so any special mention in the docs is not clear to
>>>> me. The cover letter stated "Added pid in last_cmd_status when applicable." - it
>>>> sounded as though last_cmd_status would contain the error with the pid that
>>>> encountered the error but I do not see this happening here.
>>>
>>> You are right we are not doing anything special here. pid failures error was already there.
>>> I will have to change the text here.
>>
>> What do you mean with "pid failures error was already there"? From what
>> I understand your goal is to communicate to the user which pid
>> encountered the error and I do not see that done. How will user know
>> which pid encountered a failure?
>
> We only have couple of failures to take here. Those failures are already
> handled by rdtgroup_move_task. It logs the pid for failure(using
> rdt_last_cmd_printf).

The pid is only logged if there is no task with that pid. How about the
error in __rdtgroup_move_task() - how will the user know which pid triggered
that error?

>
> I can say "The failure pid will be logged in
> /sys/fs/resctrl/info/last_cmd_status file."

That will not be accurate. Not all errors include the pid.

Reinette