RE: [PATCH] sched/fair: use dst group while checking imbalance for NUMA balancer

From: Song Bao Hua (Barry Song)
Date: Mon Sep 07 2020 - 05:46:13 EST




> -----Original Message-----
> From: Mel Gorman [mailto:mgorman@xxxxxxx]
> Sent: Monday, September 7, 2020 9:27 PM
> To: Song Bao Hua (Barry Song) <song.bao.hua@xxxxxxxxxxxxx>
> Cc: mingo@xxxxxxxxxx; peterz@xxxxxxxxxxxxx; juri.lelli@xxxxxxxxxx;
> vincent.guittot@xxxxxxxxxx; dietmar.eggemann@xxxxxxx;
> bsegall@xxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Linuxarm
> <linuxarm@xxxxxxxxxx>; Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>;
> Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>; Valentin Schneider
> <valentin.schneider@xxxxxxx>; Phil Auld <pauld@xxxxxxxxxx>; Hillf Danton
> <hdanton@xxxxxxxx>; Ingo Molnar <mingo@xxxxxxxxxx>
> Subject: Re: [PATCH] sched/fair: use dst group while checking imbalance for
> NUMA balancer
>
> On Mon, Sep 07, 2020 at 07:27:08PM +1200, Barry Song wrote:
> > Something is wrong. In find_busiest_group(), we are checking if src has
> > higher load, however, in task_numa_find_cpu(), we are checking if dst
> > will have higher load after balancing. It seems it is not sensible to
> > check src.
> > It maybe cause wrong imbalance value, for example, if
> > dst_running = env->dst_stats.nr_running + 1 results in 3 or above, and
> > src_running = env->src_stats.nr_running - 1 results in 1;
> > The current code is thinking imbalance as 0 since src_running is smaller
> > than 2.
> > This is inconsistent with load balancer.
> >
>
> It checks the conditions if the move was to happen. Have you evaluated
> this for a NUMA balancing load and confirmed it a) balances properly and
> b) does not increase the scan rate trying to "fix" the problem?

I think the original code was trying to check if the numa migration
would lead to new imbalance in load balancer. In case src is A, dst is B, and
both of them have nr_running as 2. A moves one task to B, then A
will have 1, B will have 3. In load balancer, A will try to pull task
from B since B's nr_running is larger than min_imbalance. But the code
is saying imbalance=0 by finding A's nr_running is smaller than
min_imbalance.

Will share more test data if you need.

>
> --
> Mel Gorman
> SUSE Labs

Thanks
Barry