Re: [PATCH v2] sched/fair: Introduce priority load balance to reduce interference from IDLE tasks

From: zhangsong (J)
Date: Fri Aug 19 2022 - 06:54:18 EST



On 2022/8/18 16:31, Vincent Guittot wrote:
Le jeudi 18 août 2022 à 10:46:55 (+0800), Abel Wu a écrit :
On 8/17/22 8:58 PM, Vincent Guittot Wrote:
On Tue, 16 Aug 2022 at 04:53, zhangsong (J) <zhangsong34@xxxxxxxxxx> wrote:

...

Yes, this is usually a corner case, but suppose that some non-idle tasks bounds to CPU 1-2

and idle tasks bounds to CPU 0-1, so CPU 1 may has many idle tasks and some non-idle

tasks while idle tasks on CPU 1 can not be pulled to CPU 2, when trigger load balance if

CPU 2 should pull some tasks from CPU 1, the bad result is idle tasks of CPU 1 cannot be

migrated and non-idle tasks also cannot be migrated in case of env->loop_max constraint.
env->loop_max adds a break but load_balance will continue with next
tasks so it also tries to pull your non idle task at the end after
several breaks.
Loop will be terminated without LBF_NEED_BREAK if exceeds loop_max :)
Argh yes, my brain is not yet back from vacation
I have been confused by loop_max and loop_break being set to the same value 32

Zhang Song, Could you try the patch below ? If it works, I will prepare a
clean patch with all tags



sched/fair: make sure to try to detach at least one movable task

During load balance we try at most env->loop_max time to move a task. But
it can happen that the LRU tasks (ie tail of the cfs_tasks list) can't
be moved to dst_cpu because of affinity. In this case, loop in the list
until we found at least one.

Signed-off-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
---
kernel/sched/fair.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index da388657d5ac..02b7b808e186 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8052,8 +8052,12 @@ static int detach_tasks(struct lb_env *env)
p = list_last_entry(tasks, struct task_struct, se.group_node);

env->loop++;
- /* We've more or less seen every task there is, call it quits */
- if (env->loop > env->loop_max)
+ /*
+ * We've more or less seen every task there is, call it quits
+ * unless we haven't found any movable task yet.
+ */
+ if (env->loop > env->loop_max &&
+ !(env->flags & LBF_ALL_PINNED))
break;

/* take a breather every nr_migrate tasks */
@@ -10182,7 +10186,9 @@ static int load_balance(int this_cpu, struct rq *this_rq,

if (env.flags & LBF_NEED_BREAK) {
env.flags &= ~LBF_NEED_BREAK;
- goto more_balance;
+ /* Stop if we tried all running tasks */
+ if (env.loop < busiest->nr_running)
+ goto more_balance;
}

/*
--
2.17.1

Thanks for your reply.
I have tried your patch and run test compared with it, it seems that the patch you provide makes no sense.
The test result is below(1000 idle tasks bounds to CPU 0-1 and 10 normal tasks bounds to CPU 1-2):

=================================================================

Without patch:


          6,777.37 msec cpu-clock                 #    1.355 CPUs utilized
            20,812      context-switches          #    0.003 M/sec
                 0      cpu-migrations            #    0.000 K/sec
                 0      page-faults               #    0.000 K/sec
    13,333,983,148      cycles                    #    1.967 GHz
     6,457,930,305      instructions              #    0.48  insn per cycle
     2,125,644,649      branches                  #  313.639 M/sec
         1,690,587      branch-misses             #    0.08% of all branches
      5.001931983 seconds time elapsed

With your patch:


          6,791.46 msec cpu-clock                 #    1.358 CPUs utilized
            20,996      context-switches          #    0.003 M/sec
                 0      cpu-migrations            #    0.000 K/sec
                 0      page-faults               #    0.000 K/sec
    13,467,573,052      cycles                    #    1.983 GHz
     6,516,989,062      instructions              #    0.48  insn per cycle
     2,145,139,220      branches                  #  315.858 M/sec
         1,751,454      branch-misses             #    0.08% of all branches

       5.002274267 seconds time elapsed

With my patch:


          7,495.14 msec cpu-clock                 #    1.499 CPUs utilized
            23,176      context-switches          #    0.003 M/sec
               309      cpu-migrations            #    0.041 K/sec
                 0      page-faults               #    0.000 K/sec
    14,849,083,489      cycles                    #    1.981 GHz
     7,180,832,268      instructions              #    0.48  insn per cycle
     2,363,300,644      branches                  #  315.311 M/sec
         1,964,169      branch-misses             #    0.08% of all branches

       5.001713352 seconds time elapsed
===============================================================

Obviously,  when your patch is applied, the cpu-migrations of normal tasks is still 0 and the
CPU ulization of normal tasks have no improvement compared with no patch applied.
When apply my patch, the cpu-migrations and CPU ulization of normal tasks can both improve.
I cannot explain the result with your patch, you also can test it by yourself.

Best,
Zhang Song


This will cause non-idle tasks cannot achieve more CPU utilization.
Your problem is not linked to IDLE vs NORMAL tasks but to the large
number of pinned tasks that can't migrate on CPU2. You can end with
the same behavior without using IDLE tasks but only NORMAL tasks.
I feel the same thing.

Best,
Abel
.