Re: [PATCH] sched/fair: prevent cpu burst too many periods

From: Benjamin Segall
Date: Mon Nov 29 2021 - 17:20:29 EST


Honglei Wang <wanghonglei@xxxxxxxxxxxxxxx> writes:

> Tasks might get more cpu than quota in persistent periods due to the
> cpu burst introduced by commit f4183717b370 ("sched/fair: Introduce the
> burstable CFS controller"). For example, one task group whose quota is
> 100ms per period and can get 100ms burst, and its avg utilization is
> around 105ms per period. Once this group gets a free period which
> leaves enough runtime, it has a chance to get computting power more
> than its quota for 10 periods or more in common bandwidth configuration
> (say, 100ms as period). It means tasks can 'steal' the bursted power to
> do daily jobs because all tasks could be scheduled out or sleep to help
> the group get free periods.
>
> I believe the purpose of cpu burst is to help handling bursty worklod.
> But if one task group can get computting power more than its quota for
> persistent periods even there is no bursty workload, it's kinda broke.
>
> This patch limits the burst to one period so that it won't break the
> quota limit for long. With this, we can give task group more cpu burst
> power to handle the real bursty workload and don't worry about the
> 'stealing'.

CC ing the burst patch author.

Whether or not burst is useful only for burst, or also for a bit of
long-term-only fairness is not entirely clear to me. Assuming we want it
only for burst, cutting off this sharply has a bit of additional
downside because it means that if a period refresh lands in the middle
of a burst then you lose the burst runtime. Permitting only two periods
in a row to make use of burst should be doable but it's yet another
piece of state added to cfs_b for this, and given typical ~100ms periods
that may be low enough odds that we don't care.

>
> Signed-off-by: Honglei Wang <wanghonglei@xxxxxxxxxxxxxxx>
> ---
> kernel/sched/fair.c | 9 ++++++---
> 1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 6e476f6d9435..cc2c4567fc81 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4640,14 +4640,17 @@ void __refill_cfs_bandwidth_runtime(struct cfs_bandwidth *cfs_b)
> if (unlikely(cfs_b->quota == RUNTIME_INF))
> return;
>
> - cfs_b->runtime += cfs_b->quota;
> - runtime = cfs_b->runtime_snap - cfs_b->runtime;
> + runtime = cfs_b->runtime_snap - cfs_b->quota - cfs_b->runtime;
> +
> if (runtime > 0) {
> cfs_b->burst_time += runtime;
> cfs_b->nr_burst++;
> + cfs_b->runtime = cfs_b->quota;
> + } else {
> + cfs_b->runtime += cfs_b->quota;
> + cfs_b->runtime = min(cfs_b->runtime, cfs_b->quota + cfs_b->burst);
> }
>
> - cfs_b->runtime = min(cfs_b->runtime, cfs_b->quota + cfs_b->burst);
> cfs_b->runtime_snap = cfs_b->runtime;
> }

If we do this, it should also be mentioned in
Documentation/scheduler/sched-bwc.rst, since the straightforward
description of burst as extra max runtime is no longer enough.