[PATCH 1/3] sched: enable interrupts and drop rq-lock during newidlebalancing

From: Gregory Haskins
Date: Fri Jun 27 2008 - 16:30:54 EST


Oprofile data shows that the system may spend a significant amount of
time (60%+) in find_busiest_groups as a result of newidle balancing. This
entire operation is a critical section since it occurs inline with
a schedule(). Since we do find_busiest_groups() et. al. without locks
held for normal balancing, lets do it for newidle as well. It will
at least allow other cpus and interrupts to make forward progress
(against our RQ) while we try to balance.

Additionally, we abort the newidle processing if we are preempted.

This patch should both improve latency response, as well as increase
throughput. It has shown to significantly contribute to a 6-12%
increase in network peformance.

Signed-off-by: Gregory Haskins <ghaskins@xxxxxxxxxx>
---

kernel/sched.c | 40 +++++++++++++++++++++++++++++++++-------
1 files changed, 33 insertions(+), 7 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index c51d9fa..56722b1 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -3426,6 +3426,15 @@ load_balance_newidle(int this_cpu, struct rq *this_rq, struct sched_domain *sd,

cpus_setall(*cpus);

+ schedstat_inc(sd, lb_count[CPU_NEWLY_IDLE]);
+
+ /*
+ * We are in a preempt-disabled section, so dropping the lock/irq
+ * here simply means that other cores may acquire the lock,
+ * and interrupts may occur.
+ */
+ spin_unlock_irq(&this_rq->lock);
+
/*
* When power savings policy is enabled for the parent domain, idle
* sibling can pick up load irrespective of busy siblings. In this case,
@@ -3436,7 +3445,6 @@ load_balance_newidle(int this_cpu, struct rq *this_rq, struct sched_domain *sd,
!test_sd_parent(sd, SD_POWERSAVINGS_BALANCE))
sd_idle = 1;

- schedstat_inc(sd, lb_count[CPU_NEWLY_IDLE]);
redo:
group = find_busiest_group(sd, this_cpu, &imbalance, CPU_NEWLY_IDLE,
&sd_idle, cpus, NULL);
@@ -3456,22 +3464,37 @@ redo:
schedstat_add(sd, lb_imbalance[CPU_NEWLY_IDLE], imbalance);

ld_moved = 0;
- if (busiest->nr_running > 1) {
+ if (!need_resched() && busiest->nr_running > 1) {
/* Attempt to move tasks */
- double_lock_balance(this_rq, busiest);
- /* this_rq->clock is already updated */
- update_rq_clock(busiest);
+ local_irq_disable();
+ double_rq_lock(this_rq, busiest);
+
+ BUG_ON(this_cpu != smp_processor_id());
+
+ /*
+ * Checking rq->nr_running covers both the case where
+ * newidle-balancing pulls a task, as well as if something
+ * else issued a NEEDS_RESCHED (since we would only need
+ * a reschedule if something was moved to us)
+ */
+ if (this_rq->nr_running) {
+ spin_unlock(&busiest->lock);
+ goto out_balanced_locked;
+ }
+
ld_moved = move_tasks(this_rq, this_cpu, busiest,
imbalance, sd, CPU_NEWLY_IDLE,
&all_pinned);
spin_unlock(&busiest->lock);

- if (unlikely(all_pinned)) {
+ if (unlikely(all_pinned && !this_rq->nr_running)) {
+ spin_unlock_irq(&this_rq->lock);
cpu_clear(cpu_of(busiest), *cpus);
if (!cpus_empty(*cpus))
goto redo;
}
- }
+ } else
+ spin_lock_irq(&this_rq->lock);

if (!ld_moved) {
schedstat_inc(sd, lb_failed[CPU_NEWLY_IDLE]);
@@ -3484,6 +3507,9 @@ redo:
return ld_moved;

out_balanced:
+ spin_lock_irq(&this_rq->lock);
+
+out_balanced_locked:
schedstat_inc(sd, lb_balanced[CPU_NEWLY_IDLE]);
if (!sd_idle && sd->flags & SD_SHARE_CPUPOWER &&
!test_sd_parent(sd, SD_POWERSAVINGS_BALANCE))

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/