Re: [PATCH v7 6/9] sched/fair: Add sched group latency support

From: Vincent Guittot
Date: Fri Nov 04 2022 - 11:03:36 EST


On Fri, 4 Nov 2022 at 15:57, Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
>
> On Fri, Nov 04, 2022 at 03:24:23PM +0100, Vincent Guittot wrote:
> > On Fri, 4 Nov 2022 at 12:21, Qais Yousef <qyousef@xxxxxxxxxxx> wrote:
> > >
> > > On 11/03/22 18:02, Vincent Guittot wrote:
> > > > On Thu, 3 Nov 2022 at 15:27, Qais Yousef <qyousef@xxxxxxxxxxx> wrote:
> > > > >
> > > > > On 11/03/22 09:46, Vincent Guittot wrote:
> > > > > > On Tue, 1 Nov 2022 at 20:28, Qais Yousef <qyousef@xxxxxxxxxxx> wrote:
> > > > > > >
> > > > > > > On 10/28/22 11:34, Vincent Guittot wrote:
> > > > > > > > Task can set its latency priority with sched_setattr(), which is then used
> > > > > > > > to set the latency offset of its sched_enity, but sched group entities
> > > > > > > > still have the default latency offset value.
> > > > > > > >
> > > > > > > > Add a latency.nice field in cpu cgroup controller to set the latency
> > > > > > > > priority of the group similarly to sched_setattr(). The latency priority
> > > > > > > > is then used to set the offset of the sched_entities of the group.
> > > > > > > >
> > > > > > > > Signed-off-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
> > > > > > > > ---
> > > > > > > > Documentation/admin-guide/cgroup-v2.rst | 8 ++++
> > > > > > > > kernel/sched/core.c | 52 +++++++++++++++++++++++++
> > > > > > > > kernel/sched/fair.c | 33 ++++++++++++++++
> > > > > > > > kernel/sched/sched.h | 4 ++
> > > > > > > > 4 files changed, 97 insertions(+)
> > > > > > > >
> > > > > > > > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> > > > > > > > index be4a77baf784..d8ae7e411f9c 100644
> > > > > > > > --- a/Documentation/admin-guide/cgroup-v2.rst
> > > > > > > > +++ b/Documentation/admin-guide/cgroup-v2.rst
> > > > > > > > @@ -1095,6 +1095,14 @@ All time durations are in microseconds.
> > > > > > > > values similar to the sched_setattr(2). This maximum utilization
> > > > > > > > value is used to clamp the task specific maximum utilization clamp.
> > > > > > > >
> > > > > > > > + cpu.latency.nice
> > > > > > > > + A read-write single value file which exists on non-root
> > > > > > > > + cgroups. The default is "0".
> > > > > > > > +
> > > > > > > > + The nice value is in the range [-20, 19].
> > > > > > > > +
> > > > > > > > + This interface file allows reading and setting latency using the
> > > > > > > > + same values used by sched_setattr(2).
> > > > > > >
> > > > > > > I'm still not sure about this [1].
> > > > > >
> > > > > > I'm still not sure about what you are trying to say here ...
> > > > > >
> > > > > > This is about setting a latency nice prio to a group level.
> > > > > >
> > > > > > >
> > > > > > > In some scenarios we'd like to get the effective latency_nice of the task. How
> > > > > > > will the task inherit the cgroup value or be impacted by it?
> > > > > > >
> > > > > > > For example if there are tasks that belong to a latency sensitive cgroup; and
> > > > > > > I'd like to skip some searches in EAS to improve that latency sensitivity - how
> > > > > > > would I extract this info in EAS path given these tasks are using default
> > > > > > > latency_nice value? And if should happen if their latency_nice is set to
> > > > > > > something else other than default?
> > > > > > >
> > > > > > > [1] https://lore.kernel.org/lkml/20221012160734.hrkb5jcjdq7r23pr@wubuntu/
> > > > > >
> > > > > > Hmm so you are speaking about something that is not part of the patch.
> > > > > > Let focus on the patchset for now
> > > > >
> > > > > I am focusing on this patchset. Isn't this an essential part of the design?
> > > > > Once the interface is out we can't change it. As it stands, I can't see how it
> > > >
> > > > So, are you speaking about the interface i.e. setting a value between [-20:19]
> > >
> > > About how the cgroup and per task interface interact.
> > >
> > > How to get the effective value of latency_nice for a task that belongs to
> > > a hierarchy?
> >
> > At the common parents level of the 2 entities that you want to compare
> > and root level if there no other entity to compare with
> >
> > >
> > > If I have a task that has p->latency_nice = 20 but it belongs to a cgroup that
> > > has tg->cpu.latency.nice = -19
> >
> > according to what i said above, latency_nice = 20 inside the group and
> > -19 when comparing at the parent level
> >
> > >
> > > And I want to use this interface in EAS; how should I interpret these values?
> > > How should I walk up the hierarchy and decide the _effective_ latency_nice
> > > value?
> >
> > The current use of latency_nice doesn't need to walk the hierarchy
> > because it applies at each scheduling level so the childs
> > automatically follow parents' latency.
>
> Not really, I don't see how that will work that way in the wake up path. The
> wake up path (EAS in particular) does not walk through CPU controller group
> hierarchy from top level, it only cares about cpuset/affinities and the
> "effective" values of tasks.

I was explaining the current use of latency_ni i.e. in this patchset,
I'm not speaking about what should be done for other case like EAS

In fact, it's exactly what I explained few lines above :
"> > At the common parents level of the 2 entities that you want to compare
> > and root level if there no other entity to compare with"

>
> So when you wake up a task, how will you retrieve the attribute for 'prefer
> idle' in the wakeup path using this patchset? The only way is to aggregate
> the CGroup hierarchy information to get a per-task effective value; say using
> a min function.
>
> If you see uclamp_rq_util_with(), that also is using doing uclamp
> aggregation similarly.
>
> So I think Qais is asking about the aggregation function in the EAS wakeup
> path.
>
> Thanks.