Re: [PATCH] sched/fair: Fix fixed point arithmetic width for shares and effective load

From: Dietmar Eggemann
Date: Wed Aug 24 2016 - 06:41:31 EST


On 23/08/16 21:40, Paul Turner wrote:
> On Mon, Aug 22, 2016 at 7:00 AM, Dietmar Eggemann

[...]

>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 61d485421bed..18f80c4c7737 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -2530,8 +2530,8 @@ static long calc_cfs_shares(struct cfs_rq *cfs_rq, struct task_group *tg)
>> if (tg_weight)
>> shares /= tg_weight;
>>
>> - if (shares < MIN_SHARES)
>> - shares = MIN_SHARES;
>> + if (shares < scale_load(MIN_SHARES))
>> + shares = scale_load(MIN_SHARES);
>> if (shares > tg->shares)
>> shares = tg->shares;
>
>
> MIN_SHARES is never scaled; it is an independent floor on the value
> that can be assigned as a weight, so we never need to scale it down
> (this would actually allow the weight to drop to zero which would be
> bad).

True but I'm using scale_load() and not scale_load_down() here.

I was under the impression that we have different floor values for the
two fixed point arithmetics now.

It's already used in sched_group_set_shares() today, so I thought this
lower floor value is 2048 instead of 2 for 20 bit and we should use the
same value in calc_cfs_shares().

sched_group_set_shares()

shares = clamp(shares, scale_load(MIN_SHARES), scale_load(MAX_SHARES));
...
tg->shares = shares;

>> @@ -5023,9 +5023,9 @@ static long effective_load(struct task_group *tg, int cpu, long wl, long wg)
>> * wl = S * s'_i; see (2)
>> */
>> if (W > 0 && w < W)
>> - wl = (w * (long)tg->shares) / W;
>> + wl = (w * (long)scale_load_down(tg->shares)) / W;
>> else
>> - wl = tg->shares;
>> + wl = scale_load_down(tg->shares);
>
>
> This part looks good, effective_load() works with
> source_load/target_load values, which originate in unscaled values.

Yes.

>
> Independent of this patch:
> When we initially introduced load scaling, it was ~uniform on every
> value. Most of the current pain has come from, and will continue to
> come from, that with v2 of the load-tracking this is no longer the
> case. We have a massive number of scaled and unscaled inputs floating
> around, many of them derived values (e.g. source_load above) which
> require chasing.
>
> I propose we simplify this. Load scaling is only here so that the
> load-balancer can make sane decisions. We only need
> cfs_rq->load.weight values to be scaled.

I see, i.e. no scaling of tg->shares any more.

>
> This means:
> (1) scaling in calculating group se->se.weight (calc_cfs_shares)
> (2) probable scaling in h_load calculations
> (2) if you have a calculation involving a cfs_rq->load.weight value,
> you may need to think about scaling.
> [There's a bunch of obvious places this covers, most of them
> are the load-balancer. There's some indirects, e.g. the removed need
> to scale in calculating vruntime, but these are easy to audit just by
> searching for existing calls to scale.]
>
> Probably missed one, but the fact that this list can be written means
> its 1000 pages shorter than today's requirements.
>
> The fact that (3) becomes the only rule to remember for the most part
> makes reasoning about all of this stuff possible again because right
> now it sucks.