Re: [PATCH] sched/fair: Fix the nohz.next_balance update mess

From: Vincent Guittot
Date: Mon Feb 06 2017 - 03:07:37 EST


Hi Wanpeng

On 5 February 2017 at 10:57, Wanpeng Li <kernellwp@xxxxxxxxx> wrote:
> From: Wanpeng Li <wanpeng.li@xxxxxxxxxxx>
>
> The commit:
> c5afb6a87f2 ("sched/fair: Fix nohz.next_balance update")
>
> intends to update nohz.next_balance in two steps.
>
> 1) The ILB CPU utilizes next_balance variable in nohz_idle_balance()
> to gather the shortest next balance of other idle CPUs before
> updating nohz.next_balance.
> 2) The ILB CPU updates the nohz.next_balance according to its own
> next_balance after load balance on behalf of other idle CPUs.
>
> However, there is a mess which breaks the original intention of the

Have you got details of the mess that this generates ?

> first step, every idle CPUs update nohz.next_balance during ILB CPU
> on behalf of them to do load balance, and then the ILB CPU utilizes
> next_balance variable in nohz_idle_balance() to gather the shortest
> next balance of other idle CPUs before updating nohz.next_balance.
>
> This patch fixes it by don't update nohz.next_balance for other idle
> CPUs when ILB CPU on behalf of them to do load balance.

But how do you take into account the next balance of other idle CPUs ?

>
> Cc: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
> Cc: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
> CC: Ingo Molnar <mingo@xxxxxxxxxx>
> Signed-off-by: Wanpeng Li <wanpeng.li@xxxxxxxxxxx>
> ---
> kernel/sched/fair.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 274c747..83948a4 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8750,7 +8750,8 @@ static void rebalance_domains(struct rq *rq, enum cpu_idle_type idle)
> * balance for itself and we need to update the
> * nohz.next_balance accordingly.
> */
> - if ((idle == CPU_IDLE) && time_after(nohz.next_balance, rq->next_balance))
> + if ((idle == CPU_IDLE) && time_after(nohz.next_balance, rq->next_balance) &&
> + !test_bit(NOHZ_BALANCE_KICK, nohz_flags(this_rq()->cpu)))
> nohz.next_balance = rq->next_balance;

With this change only the ILB CPU will update the nohz.next_balance
but what about the next_balance of other idle CPUs ?
The nohz.next_balance must be the next_balance of all idle CPU not only the ILB.
So an idle CPU (other than the ILB) will have to wait for the ILB
CPU's period evcen if it has shorter load balance period

Vincent

> #endif
> }
> --
> 2.7.4
>