Re: [PATCH v3 1/7] x86/resctrl: Add multiple tasks to the resctrl group at once

From: Moger, Babu
Date: Mon Mar 20 2023 - 11:13:11 EST


Hi Reinette,

On 3/16/23 15:33, Reinette Chatre wrote:
> Hi Babu,
>
> On 3/16/2023 12:51 PM, Moger, Babu wrote:
>> On 3/16/23 12:12, Reinette Chatre wrote:
>>> On 3/16/2023 9:27 AM, Moger, Babu wrote:
>>>>> -----Original Message-----
>>>>> From: Reinette Chatre <reinette.chatre@xxxxxxxxx>
>>>>> Sent: Wednesday, March 15, 2023 1:33 PM
>>>>> To: Moger, Babu <Babu.Moger@xxxxxxx>; corbet@xxxxxxx;
>>>>> tglx@xxxxxxxxxxxxx; mingo@xxxxxxxxxx; bp@xxxxxxxxx
>>>>> Cc: fenghua.yu@xxxxxxxxx; dave.hansen@xxxxxxxxxxxxxxx; x86@xxxxxxxxxx;
>>>>> hpa@xxxxxxxxx; paulmck@xxxxxxxxxx; akpm@xxxxxxxxxxxxxxxxxxxx;
>>>>> quic_neeraju@xxxxxxxxxxx; rdunlap@xxxxxxxxxxxxx;
>>>>> damien.lemoal@xxxxxxxxxxxxxxxxxx; songmuchun@xxxxxxxxxxxxx;
>>>>> peterz@xxxxxxxxxxxxx; jpoimboe@xxxxxxxxxx; pbonzini@xxxxxxxxxx;
>>>>> chang.seok.bae@xxxxxxxxx; pawan.kumar.gupta@xxxxxxxxxxxxxxx;
>>>>> jmattson@xxxxxxxxxx; daniel.sneddon@xxxxxxxxxxxxxxx; Das1, Sandipan
>>>>> <Sandipan.Das@xxxxxxx>; tony.luck@xxxxxxxxx; james.morse@xxxxxxx;
>>>>> linux-doc@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
>>>>> bagasdotme@xxxxxxxxx; eranian@xxxxxxxxxx; christophe.leroy@xxxxxxxxxx;
>>>>> jarkko@xxxxxxxxxx; adrian.hunter@xxxxxxxxx; quic_jiles@xxxxxxxxxxx;
>>>>> peternewman@xxxxxxxxxx
>>>>> Subject: Re: [PATCH v3 1/7] x86/resctrl: Add multiple tasks to the resctrl group
>>>>> at once
>>>>>
>>>>> Hi Babu,
>>>>>
>>>>> On 3/2/2023 12:24 PM, Babu Moger wrote:
>>>>>> The resctrl task assignment for MONITOR or CONTROL group needs to be
>>>>>> done one at a time. For example:
>>>>>>
>>>>>> $mount -t resctrl resctrl /sys/fs/resctrl/
>>>>>> $mkdir /sys/fs/resctrl/clos1
>>>>>> $echo 123 > /sys/fs/resctrl/clos1/tasks
>>>>>> $echo 456 > /sys/fs/resctrl/clos1/tasks
>>>>>> $echo 789 > /sys/fs/resctrl/clos1/tasks
>>>>>>
>>>>>> This is not user-friendly when dealing with hundreds of tasks. Also,
>>>>>> there is a syscall overhead for each command executed from user space.
>>>>>
>>>>> To support this change it may also be helpful to add that moving tasks take the
>>>>> mutex so attempting to move tasks in parallel will not achieve a significant
>>>>> performance gain.
>>>>
>>>> Agree. It may not be significant performance gain. Will remove this line.
>>>
>>> It does not sound as though you are actually responding to my comment.
>>
>> I am confused. I am already saying there is syscall overhead for each
>> command if we move the tasks one by one. Now do you want me to add "moving
>> tasks take the mutex so attempting to move tasks in parallel will not
>> achieve a significant performance gain".
>>
>> It is contradictory, So, I wanted to remove the line about performance.
>> Did I still miss something?
>
> Where is the contradiction?
>
> Consider your example:
> $echo 123 > /sys/fs/resctrl/clos1/tasks
> $echo 456 > /sys/fs/resctrl/clos1/tasks
> $echo 789 > /sys/fs/resctrl/clos1/tasks
>
> Yes, there is syscall overhead for each of the above lines. My statement was in
> support of this work by stating that a user aiming to improve performance by
> attempting the above in parallel would not be able to see achieve significant
> performance gain since the calls would end up being serialized.

ok. Sure. Will add the text. I may modify little bit.
>
> You are providing two motivations (a) "user-friendly when dealing with
> hundreds of tasks", and (b) syscall overhead. Have you measured the
> improvement this solution provides?

No. I have not measured the performance improvement.

>
>>>>>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>>>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>>>> @@ -683,16 +683,34 @@ static ssize_t rdtgroup_tasks_write(struct
>>>>> kernfs_open_file *of,
>>>>>> char *buf, size_t nbytes, loff_t off) {
>>>>>> struct rdtgroup *rdtgrp;
>>>>>> + char *pid_str;
>>>>>> int ret = 0;
>>>>>> pid_t pid;
>>>>>>
>>>>>> - if (kstrtoint(strstrip(buf), 0, &pid) || pid < 0)
>>>>>> + /* Valid input requires a trailing newline */
>>>>>> + if (nbytes == 0 || buf[nbytes - 1] != '\n')
>>>>>> return -EINVAL;
>>>>>
>>>>> The resctrl files should be seen as user space API. With the above change you
>>>>> take an interface that did not require a newline and dictate that it should have
>>>>> a trailing newline. How convinced are you that this does not break any current
>>>>> user space scripts or applications? Why does this feature require a trailing
>>>>> newline?
>>>>
>>>> I have tested these changes with intel_cmt_cat tool. It didn’t have any problems.
>>>> We are already doing newline check for few other inputs.
>>>
>>> You tested this with the _one_ user space tool that you use. This is not sufficient
>>> to be convincing that this change has no impact. I do not believe that it is a valid
>>> argument that other inputs do a newline check. This input never required a newline
>>> check and it is not clear why this change now requires it. It seems that this is an
>>> unnecessary new requirement that runs the risk of breaking an existing application.
>>>
>>> I would like to ask again: How convinced are you that this does not break _any_ current
>>> user space scripts or applications? Why does this feature require a trailing
>>> newline?

I dont think this feature required trailing newline. I may have carried
away from similar code in the area. I will remove that requirement.

>>
>> I do not know of any other tool using resctrl fs.
>> So, you want me to drop the newline requirement for this. I can try that.
>> Will let you know how it goes.
>
> You continue to avoid my question about why this requires a newline. Until
> I learn why this is required, yes, from what I can tell based on this patch
> this requirement can and should be dropped.
>
>>>>>> +
>>>>>> + buf[nbytes - 1] = '\0';
>>>>>> +
>>>>>> rdtgrp = rdtgroup_kn_lock_live(of->kn);
>>>>>> if (!rdtgrp) {
>>>>>> rdtgroup_kn_unlock(of->kn);
>>>>>> return -ENOENT;
>>>>>> }
>>>>>> +
>>>>>> +next:
>>>>>> + if (!buf || buf[0] == '\0')
>>>>>> + goto unlock;
>>>>>> +
>>>>>> + pid_str = strim(strsep(&buf, ","));
>>>>>> +
>>>>>
>>>>> Could lib/cmdline.c:get_option() be useful?
>>>>
>>>> Yes. We could that also. May not be required for the simple case like this.
>>>
>>> Please keep an eye out for how much of it you end up duplicating ....
>>
>> Using the get_options will require at least two calls(one to get the
>> length and then read the integers). Also need to allocate the integers
>> array dynamically. That is lot code if we are going that route.
>>
>
> I did not ask about get_options(), I asked about get_option().

If you insist, will use get_option. But we still have to loop thru all the
string till get_option returns 0. I can try that.

>
>>>
>>>>>> + ret = -EINVAL;
>>>>>> + goto unlock;
>>>>>> + }
>>>>>> +
>>>>>> rdt_last_cmd_clear();
>>>>>>
>>>>>> if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED || @@ -703,6
>>>>> +721,10 @@
>>>>>> static ssize_t rdtgroup_tasks_write(struct kernfs_open_file *of,
>>>>>> }
>>>>>>
>>>>>> ret = rdtgroup_move_task(pid, rdtgrp, of);
>>>>>> + if (ret)
>>>>>> + goto unlock;
>>>>>> + else
>>>>>> + goto next;
>>>>>>
>>>>>
>>>>> The documentation states "The failure details will be logged in
>>>>> resctrl/info/last_cmd_status file." but I do not see how this is happening here.
>>>>> From what I can tell this implementation does not do anything beyond what
>>>>> last_cmd_status already does so any special mention in the docs is not clear to
>>>>> me. The cover letter stated "Added pid in last_cmd_status when applicable." - it
>>>>> sounded as though last_cmd_status would contain the error with the pid that
>>>>> encountered the error but I do not see this happening here.
>>>>
>>>> You are right we are not doing anything special here. pid failures error was already there.
>>>> I will have to change the text here.
>>>
>>> What do you mean with "pid failures error was already there"? From what
>>> I understand your goal is to communicate to the user which pid
>>> encountered the error and I do not see that done. How will user know
>>> which pid encountered a failure?
>>
>> We only have couple of failures to take here. Those failures are already
>> handled by rdtgroup_move_task. It logs the pid for failure(using
>> rdt_last_cmd_printf).
>
> The pid is only logged if there is no task with that pid. How about the
> error in __rdtgroup_move_task() - how will the user know which pid triggered
> that error?

Yes. These cases we may be able to report the pid.

>
>>
>> I can say "The failure pid will be logged in
>> /sys/fs/resctrl/info/last_cmd_status file."
>
> That will not be accurate. Not all errors include the pid.

Can you please suggest?
Thanks
Babu Moger