Re: [PATCH v2 0/3] sched/pelt: Don't sync hardly *_sum with *_avg

From: Dietmar Eggemann
Date: Tue Jan 04 2022 - 06:47:01 EST


On 22/12/2021 10:37, Vincent Guittot wrote:

INHO you mean s/hardly/hard here?

> Rick reported performance regressions in bugzilla because of cpu
> frequency being lower than before:
> https://bugzilla.kernel.org/show_bug.cgi?id=215045
>
> He bisected the problem to:
> commit 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
>
> More details are available in commit message of patch 1.
>
> This patchset reverts the commit above and adds several checks when
> propagating the changes in the hierarchy to make sure that we still have
> coherent util_avg and util_sum.
>
> Dietmar found a simple way to reproduce the WARN fixed by
> commit 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
> by looping on hackbench in several different sched group levels.
>
> This patchset as run on the reproducer with success but it probably needs
> more tests by people who faced the WARN before.
>
> The changes done on util_sum have been also applied to runnable_sum and
> load_sum which faces the same rounding problem although this has not been
> reflected in measurable performance impact.

I think the overall idea here is that:

[add\|sub]_positive(&sa->X_avg, Y); (`add` in update_tg_cfs_X())
sa->FOO_sum = sa->X_avg * divider;

with X equal (util, runnable, load)

changes to:

[add\|sub]_positive(&sa->X_avg, Y);
[add\|sub]_positive(&sa->X_sum, Z);
sa->X_sum = max_t(u32, sa->X_sum, sa->X_avg * MIN_DIVIDER);

This change is done in:

(1) update_cfs_rq_load_avg()
(2) detach_entity_load_avg() and dequeue_load_avg()
(3) update_tg_cfs_X() (+ down-propagating _sum independently from _avg)

Prior to:

1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
fcf6631f3736 ("sched/pelt: Ensure that *_sum is always synced w/ *_avg")
ceb6ba45dc80 ("sched/fair: Sync load_sum with load_avg after dequeue")

(i.e. the commits which get fixed by this patchset):

sub_positive(&sa->X_avg, Y);
sub_positive(&sa->X_sum, Z);

was used in (1) and (2).

(3) used sa->util_sum = sa->util_avg * divider already before (Since
95d685935a2e ("sched/pelt: Sync util/runnable_sum with PELT window when
propagating").

[...]