Re: [RFC PATCH] sched/fair: correct llc shared domain's number of busy CPUs

From: Vincent Guittot
Date: Mon May 04 2020 - 09:29:59 EST


On Mon, 4 May 2020 at 13:41, Hillf Danton <hdanton@xxxxxxxx> wrote:
>
>
> On Mon, 4 May 2020 09:53:36 Vincent Guittot wrote:
> >
> > On Mon, 4 May 2020 at 03:57, Hillf Danton wrote:
> > >
> > > The comment says, if there is an imbalance between LLC domains (IOW we
> > > could increase the overall cache use), we need some less-loaded LLC
> > > domain to pull some load.
> > >
> > > To show that imbalance, record busy CPUs as they come and go by doing
> > > a minor cleanup for sd::nohz_idle.
> >
> > Your comment failed to explain why we can get rid of sd->nohz_idle
> >
> The serialization added in 25f55d9d01ad ("sched: Fix init NOHZ_IDLE flag") to
> updating nr_busy_cpus is no longer needed after 0e369d757578 ("sched/core:
> Replace sd_busy/nr_busy_cpus with sched_domain_shared") AFAICT because a

I don't see the link between commit 0e369d757578 and the fact that we
can remove the nohz_idle field.

> recorded idle/busy CPU does not mean the current CPU could not become idle or
> busy. The right thing is to update the counter if we have a valid sd under rcu.

No it's not the root cause because the sd is per cpu so each cpu has
its own sd->nohz_idle so if cpu A set sd->nohz_idle, cpu B will not be
impact and will have to set its own.

We must ensure that nr_busy_cpus is inc/dec only once when
transitioning from/to idle/busy state in order to keep the shared
nr_busy_cpus correct. But set_cpu_sd_state_busy() is called from
scheduler_tick() which means potentially every tick:

scheduler_tick() -> trigger_load_balance() -> nohz_balancer_kick() ->
nohz_balance_exit_idle() -> set_cpu_sd_state_busy()

The nohz_idle field is there to prevent incrementing nr_busy_cpus at
every tick. But set_cpu_sd_state_busy() is called from
nohz_balance_exit_idle() since 00357f5ec5d6 ("sched/nohz: Clean up
nohz enter/exit") and the latter has a similar mechanism with
rq->nohz_tick_stopped so sd_llc->nohz_idle is useless

>
> > you remove the use of sd->nohz_idle but you don't remove it from
> > struct sched_domain
>
> A seperate cleanup for it is needed if it's no longer used somewhere else.

Please remove it in the same patch

>
> Hillf
>