Re: [PATCH v7 6/9] sched/fair: Add sched group latency support

From: Joel Fernandes
Date: Fri Nov 04 2022 - 10:57:19 EST


On Fri, Nov 04, 2022 at 03:24:23PM +0100, Vincent Guittot wrote:
> On Fri, 4 Nov 2022 at 12:21, Qais Yousef <qyousef@xxxxxxxxxxx> wrote:
> >
> > On 11/03/22 18:02, Vincent Guittot wrote:
> > > On Thu, 3 Nov 2022 at 15:27, Qais Yousef <qyousef@xxxxxxxxxxx> wrote:
> > > >
> > > > On 11/03/22 09:46, Vincent Guittot wrote:
> > > > > On Tue, 1 Nov 2022 at 20:28, Qais Yousef <qyousef@xxxxxxxxxxx> wrote:
> > > > > >
> > > > > > On 10/28/22 11:34, Vincent Guittot wrote:
> > > > > > > Task can set its latency priority with sched_setattr(), which is then used
> > > > > > > to set the latency offset of its sched_enity, but sched group entities
> > > > > > > still have the default latency offset value.
> > > > > > >
> > > > > > > Add a latency.nice field in cpu cgroup controller to set the latency
> > > > > > > priority of the group similarly to sched_setattr(). The latency priority
> > > > > > > is then used to set the offset of the sched_entities of the group.
> > > > > > >
> > > > > > > Signed-off-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
> > > > > > > ---
> > > > > > > Documentation/admin-guide/cgroup-v2.rst | 8 ++++
> > > > > > > kernel/sched/core.c | 52 +++++++++++++++++++++++++
> > > > > > > kernel/sched/fair.c | 33 ++++++++++++++++
> > > > > > > kernel/sched/sched.h | 4 ++
> > > > > > > 4 files changed, 97 insertions(+)
> > > > > > >
> > > > > > > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> > > > > > > index be4a77baf784..d8ae7e411f9c 100644
> > > > > > > --- a/Documentation/admin-guide/cgroup-v2.rst
> > > > > > > +++ b/Documentation/admin-guide/cgroup-v2.rst
> > > > > > > @@ -1095,6 +1095,14 @@ All time durations are in microseconds.
> > > > > > > values similar to the sched_setattr(2). This maximum utilization
> > > > > > > value is used to clamp the task specific maximum utilization clamp.
> > > > > > >
> > > > > > > + cpu.latency.nice
> > > > > > > + A read-write single value file which exists on non-root
> > > > > > > + cgroups. The default is "0".
> > > > > > > +
> > > > > > > + The nice value is in the range [-20, 19].
> > > > > > > +
> > > > > > > + This interface file allows reading and setting latency using the
> > > > > > > + same values used by sched_setattr(2).
> > > > > >
> > > > > > I'm still not sure about this [1].
> > > > >
> > > > > I'm still not sure about what you are trying to say here ...
> > > > >
> > > > > This is about setting a latency nice prio to a group level.
> > > > >
> > > > > >
> > > > > > In some scenarios we'd like to get the effective latency_nice of the task. How
> > > > > > will the task inherit the cgroup value or be impacted by it?
> > > > > >
> > > > > > For example if there are tasks that belong to a latency sensitive cgroup; and
> > > > > > I'd like to skip some searches in EAS to improve that latency sensitivity - how
> > > > > > would I extract this info in EAS path given these tasks are using default
> > > > > > latency_nice value? And if should happen if their latency_nice is set to
> > > > > > something else other than default?
> > > > > >
> > > > > > [1] https://lore.kernel.org/lkml/20221012160734.hrkb5jcjdq7r23pr@wubuntu/
> > > > >
> > > > > Hmm so you are speaking about something that is not part of the patch.
> > > > > Let focus on the patchset for now
> > > >
> > > > I am focusing on this patchset. Isn't this an essential part of the design?
> > > > Once the interface is out we can't change it. As it stands, I can't see how it
> > >
> > > So, are you speaking about the interface i.e. setting a value between [-20:19]
> >
> > About how the cgroup and per task interface interact.
> >
> > How to get the effective value of latency_nice for a task that belongs to
> > a hierarchy?
>
> At the common parents level of the 2 entities that you want to compare
> and root level if there no other entity to compare with
>
> >
> > If I have a task that has p->latency_nice = 20 but it belongs to a cgroup that
> > has tg->cpu.latency.nice = -19
>
> according to what i said above, latency_nice = 20 inside the group and
> -19 when comparing at the parent level
>
> >
> > And I want to use this interface in EAS; how should I interpret these values?
> > How should I walk up the hierarchy and decide the _effective_ latency_nice
> > value?
>
> The current use of latency_nice doesn't need to walk the hierarchy
> because it applies at each scheduling level so the childs
> automatically follow parents' latency.

Not really, I don't see how that will work that way in the wake up path. The
wake up path (EAS in particular) does not walk through CPU controller group
hierarchy from top level, it only cares about cpuset/affinities and the
"effective" values of tasks.

So when you wake up a task, how will you retrieve the attribute for 'prefer
idle' in the wakeup path using this patchset? The only way is to aggregate
the CGroup hierarchy information to get a per-task effective value; say using
a min function.

If you see uclamp_rq_util_with(), that also is using doing uclamp
aggregation similarly.

So I think Qais is asking about the aggregation function in the EAS wakeup
path.

Thanks.