Re: [PATCH v3 2/8] sched/fair: Maintain the identity of idle-core

From: Vincent Guittot
Date: Tue May 25 2021 - 03:11:28 EST


On Sat, 22 May 2021 at 16:11, Srikar Dronamraju
<srikar@xxxxxxxxxxxxxxxxxx> wrote:
>
> * Vincent Guittot <vincent.guittot@xxxxxxxxxx> [2021-05-22 14:42:00]:
>
> > On Fri, 21 May 2021 at 15:31, Srikar Dronamraju
> > <srikar@xxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > * Vincent Guittot <vincent.guittot@xxxxxxxxxx> [2021-05-21 14:36:15]:
> > >
> > > > On Thu, 13 May 2021 at 09:40, Srikar Dronamraju
> > > > <srikar@xxxxxxxxxxxxxxxxxx> wrote:

...

> > > > > +#endif
> > > >
> > > > CPUA-core0 enters idle
> > > > All other CPUs of core0 are already idle
> > > > set idle_core = core0
> > > > CPUB-core1 enters idle
> > > > All other CPUs of core1 are already idle so core1 becomes idle
> > > >
> > > > A task wakes up and select_idle_core returns CPUA-core0
> > > > then idle_core=-1
> > > >
> > > > At next wake up, we skip select_idlecore whereas core1 is idle
> > > >
> > > > Do I miss something ?
> > > >
> > >
> > > You are right, but this is similar to what we do currently do too. Even
> > > without this patch, we got ahead an unconditionally (We dont even have an
> > > option to see if the selected CPU was from an idle-core.) set the idle-core
> > > to -1. (Please see the hunk I removed below)
> >
> > The current race window is limited between select_idle_core() not
> > finding an idle core and the call of set_idle_cores(this, false);
> > later in end select_idle_cpu().
> >
>
> However lets say there was only one idle-core, and select_idle_core() finds
> this idle-core, it just doesn't reset has-idle-core. So on the next wakeup,
> we end up iterating through the whole LLC to find if we have an idle-core.

Yes, the current algorithm is clearing the idle core flag only when it
hasn't been able to find one in order to stay cheap in the fast wakeup
path.


>
> Also even if there were more than one idle-core in LLC and the task had a
> limited cpu_allowed_list, and hence had to skip the idle-core, then we still
> go ahead and reset the idle-core.
>
> > In your proposal, the race is not limited in time anymore. As soon as
> > the 1st core being idle and setting idle_core is then selected by
> > select_idle_core, then idle_core is broken
> >
>
> Yes, but with the next patch, as soon as a CPU within this LLC goes to idle,
> it will search and set the right idle-core.
>
> > >
> > > I try to improve upon this in the next iteration. But that again we are
> > > seeing some higher utilization probably with that change.
> > >
> > > I plan to move to a cpumask based approach in v4.
> > > By which we dont have to search for setting an idle-core but we still know
> > > if any idle-cores are around. However that will have the extra penalty of
> > > atomic operations that you commented to in one of my patches.
> > >
> > > But if you have other ideas, I would be willing to try out.
> > >
> > > >
> > > >
> > > > > return i;
> > > > > + }
> > > > >
> > > > > } else {
> > > > > if (!--nr)
> > > > > @@ -6218,9 +6226,6 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
> > > > > }
> > > > > }
> > > > >
> > > > > - if (has_idle_core)
> > > > > - set_idle_cores(this, false);
> > > > > -
> > >
> > > I was referring to this hunk.
> > >
> > > > > if (sched_feat(SIS_PROP) && !has_idle_core) {
> > > > > time = cpu_clock(this) - time;
> > > > > update_avg(&this_sd->avg_scan_cost, time);
> > > > > @@ -6276,10 +6281,9 @@ static inline bool asym_fits_capacity(int task_util, int cpu)
> > > > > */
> > > > > static int select_idle_sibling(struct task_struct *p, int prev, int target)
> > > > > {
> > > > > - bool has_idle_core = false;
> > > > > + int i, recent_used_cpu, idle_core = -1;
> > > > > struct sched_domain *sd;
> > > > > unsigned long task_util;
> > > > > - int i, recent_used_cpu;
> > > > >
> > > > > /*
> > > > > * On asymmetric system, update task utilization because we will check
> > > > > @@ -6357,16 +6361,16 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
> > > > > return target;
> > > > >
> > > > > if (sched_smt_active()) {
> > > > > - has_idle_core = test_idle_cores(target, false);
> > > > > + idle_core = get_idle_core(target, -1);
> > > > >
> > > > > - if (!has_idle_core && cpus_share_cache(prev, target)) {
> > > > > + if (idle_core < 0 && cpus_share_cache(prev, target)) {
> > > > > i = select_idle_smt(p, sd, prev);
> > > > > if ((unsigned int)i < nr_cpumask_bits)
> > > > > return i;
> > > > > }
> > > > > }
> > > > >
> > > > > - i = select_idle_cpu(p, sd, has_idle_core, target);
> > > > > + i = select_idle_cpu(p, sd, idle_core, target);
> > > > > if ((unsigned)i < nr_cpumask_bits)
> > > > > return i;
> > > > >
> > > > > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> > > > > index a189bec13729..22fbb50b036e 100644
> > > > > --- a/kernel/sched/sched.h
> > > > > +++ b/kernel/sched/sched.h
> > > > > @@ -1491,6 +1491,9 @@ static inline struct sched_domain *lowest_flag_domain(int cpu, int flag)
> > > > > DECLARE_PER_CPU(struct sched_domain __rcu *, sd_llc);
> > > > > DECLARE_PER_CPU(int, sd_llc_size);
> > > > > DECLARE_PER_CPU(int, sd_llc_id);
> > > > > +#ifdef CONFIG_SCHED_SMT
> > > > > +DECLARE_PER_CPU(int, smt_id);
> > > > > +#endif
> > > > > DECLARE_PER_CPU(struct sched_domain_shared __rcu *, sd_llc_shared);
> > > > > DECLARE_PER_CPU(struct sched_domain __rcu *, sd_numa);
> > > > > DECLARE_PER_CPU(struct sched_domain __rcu *, sd_asym_packing);
> > > > > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> > > > > index 55a0a243e871..232fb261dfc2 100644
> > > > > --- a/kernel/sched/topology.c
> > > > > +++ b/kernel/sched/topology.c
> > > > > @@ -644,6 +644,9 @@ static void destroy_sched_domains(struct sched_domain *sd)
> > > > > DEFINE_PER_CPU(struct sched_domain __rcu *, sd_llc);
> > > > > DEFINE_PER_CPU(int, sd_llc_size);
> > > > > DEFINE_PER_CPU(int, sd_llc_id);
> > > > > +#ifdef CONFIG_SCHED_SMT
> > > > > +DEFINE_PER_CPU(int, smt_id);
> > > > > +#endif
> > > > > DEFINE_PER_CPU(struct sched_domain_shared __rcu *, sd_llc_shared);
> > > > > DEFINE_PER_CPU(struct sched_domain __rcu *, sd_numa);
> > > > > DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_packing);
> > > > > @@ -667,6 +670,9 @@ static void update_top_cache_domain(int cpu)
> > > > > rcu_assign_pointer(per_cpu(sd_llc, cpu), sd);
> > > > > per_cpu(sd_llc_size, cpu) = size;
> > > > > per_cpu(sd_llc_id, cpu) = id;
> > > > > +#ifdef CONFIG_SCHED_SMT
> > > > > + per_cpu(smt_id, cpu) = cpumask_first(cpu_smt_mask(cpu));
> > > > > +#endif
> > > > > rcu_assign_pointer(per_cpu(sd_llc_shared, cpu), sds);
> > > > >
> > > > > sd = lowest_flag_domain(cpu, SD_NUMA);
> > > > > @@ -1497,6 +1503,7 @@ sd_init(struct sched_domain_topology_level *tl,
> > > > > sd->shared = *per_cpu_ptr(sdd->sds, sd_id);
> > > > > atomic_inc(&sd->shared->ref);
> > > > > atomic_set(&sd->shared->nr_busy_cpus, sd_weight);
> > > > > + sd->shared->idle_core = -1;
> > > > > }
> > > > >
> > > > > sd->private = sdd;
> > > > > --
> > > > > 2.18.2
> > > > >
> > >
> > > --
> > > Thanks and Regards
> > > Srikar Dronamraju
>
> --
> Thanks and Regards
> Srikar Dronamraju