Re: [PATCH RESEND] sched: prefer an idle cpu vs an idle sibling for BALANCE_WAKE

From: Josef Bacik
Date: Fri May 29 2015 - 17:04:50 EST


On 05/28/2015 07:05 AM, Peter Zijlstra wrote:

So maybe you want something like the below; that cures the thing Morten
raised, and we continue looking for sd, even after we found affine_sd.

It also avoids the pointless idle_cpu() check Mike raised by making
select_idle_sibling() return -1 if it doesn't find anything.

Then it continues doing the full balance IFF sd was set, which is keyed
off of sd->flags.

And note (as Mike already said), BALANCE_WAKE does _NOT_ look for idle
CPUs, it looks for the least loaded CPU. And its damn expensive.


Rewriting this entire thing is somewhere on the todo list :/


Summarizing what I've found so far.

-We turn on SD_BALANCE_WAKE on our domains for our 3.10 boxes, but not for our 4.0 boxes (due to some weird configuration issue.)
-Running with this patch is better than plain 4.0 but not as good as my patch, running with SD_BALANCE_WAKE set and not set makes no difference to the runs.
-I took out the sd = NULL; bit from the affine case like you said on IRC and I get similar results as before.
-I'm thoroughly confused as to why my patch did anything since we weren't turning on SD_BALANCE_WAKE on 4.0 in my previous runs (I assume, it isn't set now so I'm pretty sure the problem has always been there) so we should have always had sd == NULL which means we would have only ever gotten the task cpu I guess.

Now I'm looking at the code in select_idle_sibling and we do this

for_each_lower_domain(sd) {
sg = sd->groups;
do {
if (!cpumask_intersects(sched_group_cpus(sg),
tsk_cpus_allowed(p)))
goto next;

for_each_cpu(i, sched_group_cpus(sg)) {
if (i == target || !idle_cpu(i))
goto next;
}

return cpumask_first_and(sched_group_cpus(sg),
tsk_cpus_allowed(p));
next:
sg = sg->next
} while (sg != sd->groups);
}

We get all the schedule groups for the schedule domain and if any of the cpu's are not idle or the target then we skip the whole scheduling group. Isn't the scheduling group a group of CPU's? Why can't we pick an idle CPU in the group that has a none idle cpu or the target cpu? Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/