[PATCH RESEND] sched: prefer an idle cpu vs an idle sibling for BALANCE_WAKE

From: Josef Bacik
Date: Wed May 27 2015 - 17:22:51 EST


[ sorry if you get this twice, it seems like the first submission got lost ]

At Facebook we have a pretty heavily multi-threaded application that is
sensitive to latency. We have been pulling forward the old SD_WAKE_IDLE code
because it gives us a pretty significant performance gain (like 20%). It turns
out this is because there are cases where the scheduler puts our task on a busy
CPU when there are idle CPU's in the system. We verify this by reading the
cpu_delay_req_avg_us from the scheduler netlink stuff. With our crappy patch we
get much lower numbers vs baseline.

SD_BALANCE_WAKE is supposed to find us an idle cpu to run on, however it is just
looking for an idle sibling, preferring affinity over all else. This is not
helpful in all cases, and SD_BALANCE_WAKE's job is to find us an idle cpu, not
garuntee affinity. Fix this by first trying to find an idle sibling, and then
if the cpu is not idle fall through to the logic to find an idle cpu. With this
patch we get slightly better performance than with our forward port of
SD_WAKE_IDLE. Thanks,

Signed-off-by: Josef Bacik <jbacik@xxxxxx>
Acked-by: Rik van Riel <riel@xxxxxxxxxx>
---
kernel/sched/fair.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 241213b..03dafa3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4766,7 +4766,8 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f

if (sd_flag & SD_BALANCE_WAKE) {
new_cpu = select_idle_sibling(p, prev_cpu);
- goto unlock;
+ if (idle_cpu(new_cpu))
+ goto unlock;
}

while (sd) {
--
1.8.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/