Re: [PATCH 1/3] sched/fair: Introduce scaled capacity awareness in find_idlest_cpu code path

From: Rohit Jain
Date: Thu Oct 12 2017 - 21:54:46 EST


Hi Joel,


On 10/12/2017 02:47 PM, Joel Fernandes wrote:
On Thu, Oct 12, 2017 at 10:03 AM, Rohit Jain <rohit.k.jain@xxxxxxxxxx> wrote:
Hi Joel, Atish,

Moving off-line discussions to LKML, just so everyone's on the same page,
I actually like this version now and it is outperforming my previous
code, so I am on board with this version. It makes the code simpler too.
I think you should have explained what the version does differently.
Nobody can read your mind.

I apologize for being terse (will do better next time)

This is based on your (offline) suggestion (and rightly so), that
find_idlest_group today bases its decision on capacity_spare_wake which
in turn only looks at the original capacity of the CPU. This diff
(version) changes that to look at the current capacity after being
scaled down (due to IRQ/RT/etc.).

Also, this diff changed find_idlest_group_cpu to not do a search for
CPUs based on the 'full_capacity()' function, instead changed it to
find the idlest CPU with max available capacity. This way we can avoid
all the 'backup' stuff in the code as in the version (v5) below it.

I think as you can see from the way it will work itself out that the
code will look much simpler with the new search. This is OK because we
are doing a full CPU search in the sched_group_span anyway.

[..]
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 56f343b..a1f622c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5724,7 +5724,7 @@ static int cpu_util_wake(int cpu, struct task_struct
*p);

static unsigned long capacity_spare_wake(int cpu, struct task_struct *p)
{
- return capacity_orig_of(cpu) - cpu_util_wake(cpu, p);
+ return capacity_of(cpu) - cpu_util_wake(cpu, p);
}

/*
@@ -5870,6 +5870,7 @@ find_idlest_group_cpu(struct sched_group *group,
struct task_struct *p, int this
unsigned long load, min_load = ULONG_MAX;
unsigned int min_exit_latency = UINT_MAX;
u64 latest_idle_timestamp = 0;
+ unsigned int idle_cpu_cap = 0;
int least_loaded_cpu = this_cpu;
int shallowest_idle_cpu = -1;
int i;
@@ -5881,6 +5882,7 @@ find_idlest_group_cpu(struct sched_group *group,
struct task_struct *p, int this
/* Traverse only the allowed CPUs */
for_each_cpu_and(i, sched_group_span(group), &p->cpus_allowed) {
if (idle_cpu(i)) {
+ int idle_candidate = -1;
struct rq *rq = cpu_rq(i);
struct cpuidle_state *idle = idle_get_state(rq);
if (idle && idle->exit_latency < min_exit_latency) {
@@ -5891,7 +5893,7 @@ find_idlest_group_cpu(struct sched_group *group,
struct task_struct *p, int this
*/
min_exit_latency = idle->exit_latency;
latest_idle_timestamp = rq->idle_stamp;
- shallowest_idle_cpu = i;
+ idle_candidate = i;
} else if ((!idle || idle->exit_latency == min_exit_latency) &&
rq->idle_stamp > latest_idle_timestamp) {
/*
@@ -5900,8 +5902,14 @@ find_idlest_group_cpu(struct sched_group *group,
struct task_struct *p, int this
* a warmer cache.
*/
latest_idle_timestamp = rq->idle_stamp;
- shallowest_idle_cpu = i;
+ idle_candidate = i;
}
+
+ if (idle_candidate != -1 &&
+ (capacity_of(idle_candidate) > idle_cpu_cap)) {
+ shallowest_idle_cpu = idle_candidate;
+ idle_cpu_cap = capacity_of(idle_candidate);
+ }
This is broken, incase idle_candidate != -1 but idle_cpu_cap makes the
condition false - you're still setting min_exit_latency which is
wrong.

Yes, you're right. I will fix this.


Also this means if you have 2 CPUs and 1 is in a shallower idle state
than the other, but lesser in capacity, then it would select the CPU
with less shallow idle state right? So 'shallowest_idle_cpu' loses its
meaning.

OK, I will change the name

Thanks,
Rohit
[..]