Re: [PATCH 04/10] sched/fair: Return an idle cpu if one is found after a failed search for an idle core

From: Vincent Guittot
Date: Thu Dec 03 2020 - 11:37:02 EST


On Thu, 3 Dec 2020 at 15:11, Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote:
>
> select_idle_core is called when SMT is active and there is likely a free
> core available. It may find idle CPUs but this information is simply
> discarded and the scan starts over again with select_idle_cpu.
>
> This patch caches information on idle CPUs found during the search for
> a core and uses one if no core is found. This is a tradeoff. There may
> be a slight impact when utilisation is low and an idle core can be
> found quickly. It provides improvements as the number of busy CPUs
> approaches 50% of the domain size when SMT is enabled.
>
> With tbench on a 2-socket CascadeLake machine, 80 logical CPUs, HT enabled
>
> 5.10.0-rc6 5.10.0-rc6
> schedstat idlecandidate
> Hmean 1 500.06 ( 0.00%) 505.67 * 1.12%*
> Hmean 2 975.90 ( 0.00%) 974.06 * -0.19%*
> Hmean 4 1902.95 ( 0.00%) 1904.43 * 0.08%*
> Hmean 8 3761.73 ( 0.00%) 3721.02 * -1.08%*
> Hmean 16 6713.93 ( 0.00%) 6769.17 * 0.82%*
> Hmean 32 10435.31 ( 0.00%) 10312.58 * -1.18%*
> Hmean 64 12325.51 ( 0.00%) 13792.01 * 11.90%*
> Hmean 128 21225.21 ( 0.00%) 20963.44 * -1.23%*
> Hmean 256 20532.83 ( 0.00%) 20335.62 * -0.96%*
> Hmean 320 20334.81 ( 0.00%) 20147.25 * -0.92%*
>
> In this particular test, the cost/benefit is marginal except
> for 64 which was a point where the machine was over 50% busy
> but not fully utilised.
>
> Signed-off-by: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>
> ---
> kernel/sched/fair.c | 11 +++++++++--
> 1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index fc48cc99b03d..845bc0cd9158 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6066,6 +6066,7 @@ void __update_idle_core(struct rq *rq)
> */
> static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int target)
> {
> + int idle_candidate = -1;
> struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
> int core, cpu;
>
> @@ -6084,7 +6085,13 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
> schedstat_inc(this_rq()->sis_scanned);
> if (!available_idle_cpu(cpu)) {
> idle = false;
> - break;
> + if (idle_candidate != -1)
> + break;


If I get your changes correctly, it will now continue to loop on all
cpus of the smt mask to try to find an idle cpu whereas it was
breaking before as soon as a cpu was not idle. In fact, I thought that
you just wanted to be opportunistic and save a candidate but without
looping more cpus than currently.

With the change above you might end up looping all cpus of llc if
there is only one idle cpu in the llc whereas before we were looping
only 1 cpu per core at most. The bottom change makes sense but the
above on is in some way replacing completely select_idle_cpu and
bypass SIS_PROP and we should avoid that IMO

> + }
> +
> + if (idle_candidate == -1 &&
> + cpumask_test_cpu(cpu, p->cpus_ptr)) {
> + idle_candidate = cpu;
> }
> }
>
> @@ -6099,7 +6106,7 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
> */
> set_idle_cores(target, 0);
>
> - return -1;
> + return idle_candidate;
> }
>
> /*
> --
> 2.26.2
>