Re: [RFC PATCH 1/9] sched,cgroup: Add interface for latency-nice

From: Peter Zijlstra
Date: Thu Sep 05 2019 - 06:46:29 EST


On Thu, Sep 05, 2019 at 10:45:27AM +0100, Patrick Bellasi wrote:

> > From just reading the above, I would expect it to have the range
> > [-20,19] just like normal nice. Apparently this is not so.
>
> Regarding the range for the latency-nice values, I guess we have two
> options:
>
> - [-20..19], which makes it similar to priorities
> downside: we quite likely end up with a kernel space representation
> which does not match the user-space one, e.g. look at
> task_struct::prio.
>
> - [0..1024], which makes it more similar to a "percentage"
>
> Being latency-nice a new concept, we are not constrained by POSIX and
> IMHO the [0..1024] scale is a better fit.
>
> That will translate into:
>
> latency-nice=0 : default (current mainline) behaviour, all "biasing"
> policies are disabled and we wakeup up as fast as possible
>
> latency-nice=1024 : maximum niceness, where for example we can imaging
> to turn switch a CFS task to be SCHED_IDLE?

There's a few things wrong there; I really feel that if we call it nice,
it should be like nice. Otherwise we should call it latency-bias and not
have the association with nice to confuse people.

Secondly; the default should be in the middle of the range. Naturally
this would be a signed range like nice [-(x+1),x] for some x. but if you
want [0,1024], then the default really should be 512, but personally I
like 0 better as a default, in which case we need negative numbers.

This is important because we want to be able to bias towards less
importance to (tail) latency as well as more importantance to (tail)
latency.

Specifically, Oracle wants to sacrifice (some) latency for throughput.
Facebook OTOH seems to want to sacrifice (some) throughput for latency.