Re: [PATCH v4] sched: Fast idling of CPU when system is partially loaded

From: Tim Chen
Date: Tue Jun 24 2014 - 12:16:36 EST


On Mon, 2014-06-23 at 12:16 -0700, Tim Chen wrote:
> Thanks to the review from Jason, Andi and Peter. I've updated
> the code as Peter suggested with simplified logic.
>
> When a system is lightly loaded (i.e. no more than 1 job per cpu),
> attempt to pull job to a cpu before putting it to idle is unnecessary and
> can be skipped. This patch adds an indicator so the scheduler can know
> when there's no more than 1 active job is on any CPU in the system to
> skip needless job pulls.
>
> On a 4 socket machine with a request/response kind of workload from
> clients, we saw about 0.13 msec delay when we go through a full load
> balance to try pull job from all the other cpus. While 0.1 msec was
> spent on processing the request and generating a response, the 0.13 msec
> load balance overhead was actually more than the actual work being done.
> This overhead can be skipped much of the time for lightly loaded systems.
>
> With this patch, we tested with a netperf request/response workload that
> has the server busy with half the cpus in a 4 socket system. We found
> the patch eliminated 75% of the load balance attempts before idling a cpu.
>
> The overhead of setting/clearing the indicator is low as we already gather
> the necessary info while we call add_nr_running and update_sd_lb_stats.
> We switch to full load balance load immediately if any cpu got more than
> one job on its run queue in add_nr_running. We'll clear the indicator
> to avoid load balance when we detect no cpu's have more than one job
> when we scan the work queues in update_sg_lb_stats. We are aggressive
> in turning on the load balance and opportunistic in skipping the load
> balance.
>
> Signed-off-by: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx>
> Acked-by: Jason Low <jason.low2@xxxxxx>

Peter,

I need to fixup the code of updating the indicator under
the CONFIG_SMP compile flag.

Also attached a complete updated patch.

Thanks.

Tim

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 6d25f1d..d051712 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1222,9 +1222,10 @@ static inline void add_nr_running(struct rq *rq, unsigned count)
rq->nr_running = prev_nr + count;

if (prev_nr < 2 && rq->nr_running >= 2) {
+#ifdef CONFIG_SMP
if (!rq->rd->overload)
rq->rd->overload = true;
-
+#endif
#ifdef CONFIG_NO_HZ_FULL
if (tick_nohz_full_cpu(rq->cpu)) {
/* Order rq->nr_running write against the IPI */



The complete updated patch is attached below:
---