Re: [PATCH v2] sched/fair: Introduce priority load balance to reduce interference from IDLE tasks

From: zhangsong (J)
Date: Wed Aug 24 2022 - 22:01:27 EST




On 2022/8/23 21:19, Vincent Guittot wrote:
Hi Zhang,

On Mon, 22 Aug 2022 at 08:49, zhangsong (J) <zhangsong34@xxxxxxxxxx> wrote:

Hi, Vincent,

On 2022/8/20 0:04, Vincent Guittot wrote:
On Fri, 19 Aug 2022 at 14:35, Vincent Guittot
<vincent.guittot@xxxxxxxxxx> wrote:

Hi Zhang,

On Fri, 19 Aug 2022 at 12:54, zhangsong (J) <zhangsong34@xxxxxxxxxx> wrote:


On 2022/8/18 16:31, Vincent Guittot wrote:
Le jeudi 18 août 2022 à 10:46:55 (+0800), Abel Wu a écrit :
On 8/17/22 8:58 PM, Vincent Guittot Wrote:
On Tue, 16 Aug 2022 at 04:53, zhangsong (J) <zhangsong34@xxxxxxxxxx> wrote:

...

...


Thanks for your reply.
I have tried your patch and run test compared with it, it seems that the
patch you provide makes no sense.
The test result is below(1000 idle tasks bounds to CPU 0-1 and 10 normal
tasks bounds to CPU 1-2):

=================================================================

Without patch:


6,777.37 msec cpu-clock # 1.355 CPUs utilized
20,812 context-switches # 0.003 M/sec
0 cpu-migrations # 0.000 K/sec
0 page-faults # 0.000 K/sec
13,333,983,148 cycles # 1.967 GHz
6,457,930,305 instructions # 0.48 insn per cycle
2,125,644,649 branches # 313.639 M/sec
1,690,587 branch-misses # 0.08% of all
branches
5.001931983 seconds time elapsed

With your patch:


6,791.46 msec cpu-clock # 1.358 CPUs utilized
20,996 context-switches # 0.003 M/sec
0 cpu-migrations # 0.000 K/sec
0 page-faults # 0.000 K/sec
13,467,573,052 cycles # 1.983 GHz
6,516,989,062 instructions # 0.48 insn per cycle
2,145,139,220 branches # 315.858 M/sec
1,751,454 branch-misses # 0.08% of all
branches

5.002274267 seconds time elapsed

With my patch:


7,495.14 msec cpu-clock # 1.499 CPUs utilized
23,176 context-switches # 0.003 M/sec
309 cpu-migrations # 0.041 K/sec
0 page-faults # 0.000 K/sec
14,849,083,489 cycles # 1.981 GHz
7,180,832,268 instructions # 0.48 insn per cycle
2,363,300,644 branches # 315.311 M/sec
1,964,169 branch-misses # 0.08% of all
branches

5.001713352 seconds time elapsed
===============================================================

Obviously, when your patch is applied, the cpu-migrations of normal
tasks is still 0 and the
CPU ulization of normal tasks have no improvement compared with no patch
applied.
When apply my patch, the cpu-migrations and CPU ulization of normal
tasks can both improve.
I cannot explain the result with your patch, you also can test it by
yourself.

Do you have more details about the test that your are running ?

Do cpu0-2 share their cache ?
Which kingd of task are the normal and idle tasks ? always running tasks ?

I'm going to try to reproduce your problem locally

Some details of your UC are missing. I have tried to reproduce your
example above:
1000 idle tasks bounds to CPU 0-1 and 10 normal tasks bounds to CPU 1-2

Let assume that for any reason, the 10 normal tasks wake up on CPU 1.
Then, the thousand of idle tasks are moved to CPU0 by load balance and
only normal tasks stay on CPU1. Then load balance will move some
normal tasks to CPU2.

My only way to reproduce something similar to your example, is to pin
the 1000 idle tasks on CPU1 so they can't move to CPU0. Then I can see
that load balance reaches loop_max limit and gets hard time moving
normal tasks on CPU2. But in this later case, my patch helps to move
normal tasks on CPU2. Something is missing in the description of your
UC.

Sidenote, I have the same kind of problem with 1000 normal task with
low priority so it's not a matter of idle vs normal tasks

Regards,
Vincent


Sorry for my slow reply.

I have found a test case which can more illustrate this problem
accurately. The test case is below.

1. a dead foreach loop process run as normal or idle task
$ cat test.c
int main(int argc, char **argv)
{
int i = 0;
int duration = atoi(argv[1]);

while(1) {
usleep(duration);
for(i = 0; i < 100000; i++) {}
}
}
$ gcc -o test test.c

2. firstly spawn 500 idle tasks which bounds to CPU 0-2
3. secondly spawn 10 normal tasks which also bounds to CPU 0-2
4. lastly spawn 500 idle tasks which bounds to CPU 0 only
5. perf stat normal tasks and get CPU utilization and cpu-migrations


Below is the whole test script.
$ cat test.sh
#!/bin/bash

# create normal and idle cgroup path
mkdir /sys/fs/cgroup/cpu/normal/
mkdir /sys/fs/cgroup/cpu/idle/

so you put "idle" tasks in a task group and normal tasks in another
one. But both groups have default weight/share so you lose the idle
weight for idle tasks and each group will have half the cpus capacity,
i.e. 1.5 cpus per group. And this is what I have with the current
kernel, with your patch or with my patch.

In fact, there is enough tasks (510) not pinned to cpu0 to make the
system balance correctly

With this group hierarchy, the idle tasks will have the same weight
priority as normal tasks. Is it really what you want ?

tip kernel:
7 673,15 msec task-clock # 1,534 CPUs
utilized
12 662 context-switches # 1,650 K/sec
0 cpu-migrations # 0,000 /sec
5,003493176 seconds time elapsed

your patch:
7 488,35 msec task-clock # 1,497 CPUs
utilized
12 338 context-switches # 1,648 K/sec
3 cpu-migrations # 0,401 /sec
5,003406005 seconds time elapsed

my patch
7 569,57 msec task-clock # 1,513 CPUs
utilized
12 460 context-switches # 1,646 K/sec
0 cpu-migrations # 0,000 /sec
5,003437278 seconds time elapsed



I may have only consider the SCHE_IDLE policy of idle tasks without changing it's weight. It is true that if the weight of idle tasks is changed to the minimum value, my patch will not generate any revenue.

OK, I need to think it more.
Thanks a lot.


# spawn 500 idle tasks and bind tasks to CPU 0-2
for ((i = 0; i < 500; i++))
do
taskset -c 0-2 ./test 200 &
pid=$!
# change to SCHED_IDLE policy
chrt -i -p 0 $pid
echo $pid > /sys/fs/cgroup/cpu/idle/tasks
done

# spawn 10 normal tasks and bind tasks to CPU 0-2
normal_tasks=
for ((i = 0; i < 10; i++))
do
taskset -c 0-2 ./test 500 &
pid=$!
normal_tasks+=$pid,
echo $pid > /sys/fs/cgroup/cpu/normal/tasks
done

# spawn 500 idle tasks and bind tasks to CPU 0 only
for ((i = 0; i < 500; i++))
do
taskset -c 0 ./test 200 &
pid=$!
# change to SCHED_IDLE policy
chrt -i -p 0 $pid
echo $pid > /sys/fs/cgroup/cpu/idle/tasks
done

# perf stat normal tasks
perf stat -a -p $normal_tasks sleep 5
pkill -f test


You can try the above case and test it with your patch.

Regards,
Zhang Song
.