Re: [RFC PATCH 1/9] sched,cgroup: Add interface for latency-nice

From: Parth Shah
Date: Fri Sep 06 2019 - 08:22:51 EST




On 9/5/19 3:41 PM, Patrick Bellasi wrote:
>
> On Thu, Sep 05, 2019 at 07:15:34 +0100, Parth Shah wrote...
>
>> On 9/4/19 11:02 PM, Tim Chen wrote:
>>> On 8/30/19 10:49 AM, subhra mazumdar wrote:
>>>> Add Cgroup interface for latency-nice. Each CPU Cgroup adds a new file
>>>> "latency-nice" which is shared by all the threads in that Cgroup.
>>>
>>>
>>> Subhra,
>>>
>>> Thanks for posting the patchset. Having a latency nice hint
>>> is useful beyond idle load balancing. I can think of other
>>> application scenarios, like scheduling batch machine learning AVX 512
>>> processes with latency sensitive processes. AVX512 limits the frequency
>>> of the CPU and it is best to avoid latency sensitive task on the
>>> same core with AVX512. So latency nice hint allows the scheduler
>>> to have a criteria to determine the latency sensitivity of a task
>>> and arrange latency sensitive tasks away from AVX512 tasks.
>>>
>>
>>
>> Hi Tim and Subhra,
>>
>> This patchset seems to be interesting for my TurboSched patches as well
>> where I try to pack jitter tasks on fewer cores to get higher Turbo Frequencies.
>> Well, the problem I face is that we sometime end up putting multiple jitter tasks on a core
>> running some latency sensitive application which may see performance degradation.
>> So my plan was to classify such tasks to be latency sensitive thereby hinting the load
>> balancer to not put tasks on such cores.
>>
>> TurboSched: https://lkml.org/lkml/2019/7/25/296
>>
>>> You configure the latency hint on a cgroup basis.
>>> But I think not all tasks in a cgroup necessarily have the same
>>> latency sensitivity.
>>>
>>> For example, I can see that cgroup can be applied on a per user basis,
>>> and the user could run different tasks that have different latency sensitivity.
>>> We may also need a way to configure latency sensitivity on a per task basis instead on
>>> a per cgroup basis.
>>>
>>
>> AFAIU, the problem defined above intersects with my patches as well where the interface
>> is required to classify the jitter tasks. I have already tried few methods like
>> syscall and cgroup to classify such tasks and maybe something like that can be adopted
>> with these patchset as well.
>
> Agree, these two patchest are definitively overlapping in terms of
> mechanisms and APIs to expose to userspace. You to guys seems to target
> different goals but the general approach should be:
>
> - expose a single and abstract concept to user-space
> latency-nice or latency-tolerant as PaulT proposed at OSPM
>

I agree. Both the patchset tries to classify a tasks for some purpose for better latency.
TurboSched requires the classification of whether the task is jitter and should not be given
enough resources/frequency. This is a boolean value.
Whereas, latency-nice is a range. So does that mean that a max-latency-nice task is a jitter?

I was thinking of not doing jitter packing on a core occupying
min-latency-nice (i.e, latency sensitive) task (until there are other busier cores).

Given this, we can expose a single per-task attribute to the user by a syscall, right?

> - map this concept in kernel-space to different kind of bias, both at
> wakeup time and load-balance time, and use both for RT and CFS tasks.
>
> That's my understanding at least ;)
>
> I guess we will have interesting discussions at the upcoming LPC to
> figure out a solution fitting all needs.

Definitely.

>
>> Thanks,
>> Parth
>
> Best,
> Patrick
>