Re: [RFC PATCH v2 1/2] sched/fair: Introduce UTIL_FITS_CAPACITY feature (v2)

From: Mathieu Desnoyers
Date: Tue Oct 24 2023 - 10:49:34 EST


On 2023-10-24 02:10, Chen Yu wrote:
On 2023-10-23 at 11:04:49 -0400, Mathieu Desnoyers wrote:
On 2023-10-23 10:11, Dietmar Eggemann wrote:
On 19/10/2023 18:05, Mathieu Desnoyers wrote:

[...]
+static unsigned long scale_rt_capacity(int cpu);
+
+/*
+ * Returns true if adding the task utilization to the estimated
+ * utilization of the runnable tasks on @cpu does not exceed the
+ * capacity of @cpu.
+ *
+ * This considers only the utilization of _runnable_ tasks on the @cpu
+ * runqueue, excluding blocked and sleeping tasks. This is achieved by
+ * using the runqueue util_est.enqueued.
+ */
+static inline bool task_fits_remaining_cpu_capacity(unsigned long task_util,
+ int cpu)

Or like find_energy_efficient_cpu() (feec(), used in
Energy-Aware-Scheduling (EAS)) which uses cpu_util(cpu, p, cpu, 0) to get:

max(util_avg(CPU + p), util_est(CPU + p))

I've tried using cpu_util(), but unfortunately anything that considers
blocked/sleeping tasks in its utilization total does not work for my
use-case.

From cpu_util():

* CPU utilization is the sum of running time of runnable tasks plus the
* recent utilization of currently non-runnable tasks on that CPU.


I thought cpu_util() indicates the utilization decay sum of task that was once
"running" on this CPU, but will not sum up the "util/load" of the blocked/sleeping
task?

accumulate_sum()
/* only the running task's util will be sum up */
if (running)
sa->util_sum += contrib << SCHED_CAPACITY_SHIFT;

WRITE_ONCE(sa->util_avg, sa->util_sum / divider);

The accumulation into the cfs_rq->avg.util_sum indeed only happens when the task
is running, which means that the task does not actively contribute to increment
the util_sum when it is blocked/sleeping.

However, when the task is blocked/sleeping, the task is still attached to the
runqueue, and therefore its historic util_sum still contributes to the cfs_rq
util_sum/util_avg. This completely differs from what happens when the task is
migrated to a different runqueue, in which case its util_sum contribution is
entirely removed from the cfs_rq util_sum:

static void
enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
{
[...]
update_load_avg(cfs_rq, se, UPDATE_TG | DO_ATTACH)
[...]

static void
dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
{
[...]
if (entity_is_task(se) && task_on_rq_migrating(task_of(se)))
action |= DO_DETACH;
[...]

static inline void update_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
{
[...]
if (!se->avg.last_update_time && (flags & DO_ATTACH)) {

/*
* DO_ATTACH means we're here from enqueue_entity().
* !last_update_time means we've passed through
* migrate_task_rq_fair() indicating we migrated.
*
* IOW we're enqueueing a task on a new CPU.
*/
attach_entity_load_avg(cfs_rq, se);
update_tg_load_avg(cfs_rq);

} else if (flags & DO_DETACH) {
/*
* DO_DETACH means we're here from dequeue_entity()
* and we are migrating task out of the CPU.
*/
detach_entity_load_avg(cfs_rq, se);
update_tg_load_avg(cfs_rq);
[...]

In comparison, util_est_enqueue()/util_est_dequeue() are called from enqueue_task_fair()
and dequeue_task_fair(), which include blocked/sleeping tasks scenarios. Therefore, util_est
only considers runnable tasks in its cfs_rq->avg.util_est.enqueued.

The current rq utilization total used for rq selection should not include historic
utilization of all blocked/sleeping tasks, because we are taking a decision to bring
back a recently blocked/sleeping task onto a runqueue at that point. Considering
the historic util_sum from the set of other blocked/sleeping tasks still attached to that
runqueue in the current utilization mistakenly makes the rq selection think that the rq is
busier than it really is.

I suspect that cpu_util_without() is an half-successful attempt at solving this by removing
the task p from the considered utilization, but it does not take into account scenarios where many
other tasks happen to be blocked/sleeping as well.

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com