Re: [PATCH v7 6/9] sched/fair: Add sched group latency support

From: Joel Fernandes
Date: Fri Nov 04 2022 - 06:15:08 EST


On Thu, Nov 3, 2022 at 5:03 PM Vincent Guittot
<vincent.guittot@xxxxxxxxxx> wrote:
>
> On Thu, 3 Nov 2022 at 15:27, Qais Yousef <qyousef@xxxxxxxxxxx> wrote:
> >
> > On 11/03/22 09:46, Vincent Guittot wrote:
> > > On Tue, 1 Nov 2022 at 20:28, Qais Yousef <qyousef@xxxxxxxxxxx> wrote:
> > > >
> > > > On 10/28/22 11:34, Vincent Guittot wrote:
> > > > > Task can set its latency priority with sched_setattr(), which is then used
> > > > > to set the latency offset of its sched_enity, but sched group entities
> > > > > still have the default latency offset value.
> > > > >
> > > > > Add a latency.nice field in cpu cgroup controller to set the latency
> > > > > priority of the group similarly to sched_setattr(). The latency priority
> > > > > is then used to set the offset of the sched_entities of the group.
> > > > >
> > > > > Signed-off-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
> > > > > ---
> > > > > Documentation/admin-guide/cgroup-v2.rst | 8 ++++
> > > > > kernel/sched/core.c | 52 +++++++++++++++++++++++++
> > > > > kernel/sched/fair.c | 33 ++++++++++++++++
> > > > > kernel/sched/sched.h | 4 ++
> > > > > 4 files changed, 97 insertions(+)
> > > > >
> > > > > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> > > > > index be4a77baf784..d8ae7e411f9c 100644
> > > > > --- a/Documentation/admin-guide/cgroup-v2.rst
> > > > > +++ b/Documentation/admin-guide/cgroup-v2.rst
> > > > > @@ -1095,6 +1095,14 @@ All time durations are in microseconds.
> > > > > values similar to the sched_setattr(2). This maximum utilization
> > > > > value is used to clamp the task specific maximum utilization clamp.
> > > > >
> > > > > + cpu.latency.nice
> > > > > + A read-write single value file which exists on non-root
> > > > > + cgroups. The default is "0".
> > > > > +
> > > > > + The nice value is in the range [-20, 19].
> > > > > +
> > > > > + This interface file allows reading and setting latency using the
> > > > > + same values used by sched_setattr(2).
> > > >
> > > > I'm still not sure about this [1].
> > >
> > > I'm still not sure about what you are trying to say here ...
> > >
> > > This is about setting a latency nice prio to a group level.
> > >
> > > >
> > > > In some scenarios we'd like to get the effective latency_nice of the task. How
> > > > will the task inherit the cgroup value or be impacted by it?
> > > >
> > > > For example if there are tasks that belong to a latency sensitive cgroup; and
> > > > I'd like to skip some searches in EAS to improve that latency sensitivity - how
> > > > would I extract this info in EAS path given these tasks are using default
> > > > latency_nice value? And if should happen if their latency_nice is set to
> > > > something else other than default?
> > > >
> > > > [1] https://lore.kernel.org/lkml/20221012160734.hrkb5jcjdq7r23pr@wubuntu/
> > >
> > > Hmm so you are speaking about something that is not part of the patch.
> > > Let focus on the patchset for now
> >
> > I am focusing on this patchset. Isn't this an essential part of the design?
> > Once the interface is out we can't change it. As it stands, I can't see how it
>
> So, are you speaking about the interface i.e. setting a value between [-20:19]
>
> > can be used to replace prefer_idle in cgroup as used in Android. I can't see
> > how this could happen if we don't define how the task will inherit the cgroup
> > value. If we can, mind elaborating how please?
>
> Or how to take into account the value set for a cgroup ?
>
> Regarding the behavior, the rule remains the same that a sched_entity
> attached to a cgroup will not get more (latency in this case) than
> what has been set for the group entity.

I think the interface solves a different problem which is latency of
task or cgroup wrt other group. Vincent, you are setting this for a
“top app” group in android in your tests, and seeing improvement
correct? AFAICS, this improvement comes because of lower latency
during *same CPU* competition between different groups by juggling
around the wakeup-preemption window -- which maybe is good for
Android.

OTOH, the “prefer idle” flag in android that Qais is referring to,
will need a completely different method as I cannot see how a nice
value can communicate that (that can complement Vincent's changes
here). And it will need to have a per-task interface as well. We have
something in ChromeOS as well, which is a proc knob and also
out-of-tree patch for that [1]. Without [1] we fail Android CTS
testing on a recent ARM64 ChromeOS device.
[1] https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/3884575
The changelog in [1] also has a detailed description of the ChromeOS usecase.

Qais, any other reason you can see why Vincent's change will not be a
good thing for Android? Since you 1 CGroup for the whole user-facing
app (top app), you can just set that to a low "latency_nice" and get
better wake-up latency for that.

(Side rant about latency and CFS -- IMHO a better long term solution
for lower latency is to use RT but don't throttle -- rather demote. Or
break CFS into multiple tiers, and apply demotion. This is in a way
what Vincent is doing, as the task becomes more CPU bound'ish, he's
taking away the latency boost. Vincent/Qais, somebody was working on
the RT demotion vs throttling a while back, any idea on the latest on
that?).

thanks,

- Joel


>
> >
> >
> > Thanks
> >
> > --
> > Qais Yousef