[PATCH RFC] smt nice introduces significant lock contention

From: Chris Mason
Date: Thu Jun 01 2006 - 18:54:52 EST


Hello everyone,

Recent benchmarks showed some performance regressions between 2.6.16 and
2.6.5. We tracked down one of the regressions to lock contention in schedule
heavy workloads (~70,000 context switches per second)

kernel/sched.c:dependent_sleeper() was responsible for most of the lock
contention, hammering on the run queue locks. The patch below is more of
a discussion point than a suggested fix (although it does reduce lock
contention significantly). The dependent_sleeper code looks very expensive
to me, especially for using a spinlock to bounce control between two different
siblings in the same cpu.

--- a/kernel/sched.c Thu May 18 15:55:43 2006 -0400
+++ b/kernel/sched.c Tue May 23 21:13:52 2006 -0400
@@ -2630,6 +2630,27 @@ static inline void wakeup_busy_runqueue(
resched_task(rq->idle);
}

+static int trylock_smt_cpus(cpumask_t sibling_map)
+{
+ int ret = 1;
+ int numlocked = 0;
+ int i;
+ for_each_cpu_mask(i, sibling_map) {
+ ret = spin_trylock(&cpu_rq(i)->lock);
+ if (!ret)
+ break;
+ numlocked++;
+ }
+ if (ret || !numlocked)
+ return ret;
+ for_each_cpu_mask(i, sibling_map) {
+ spin_unlock(&cpu_rq(i)->lock);
+ if (--numlocked == 0)
+ break;
+ }
+ return 0;
+}
+
static void wake_sleeping_dependent(int this_cpu, runqueue_t *this_rq)
{
struct sched_domain *tmp, *sd = NULL;
@@ -2643,22 +2664,16 @@ static void wake_sleeping_dependent(int
if (!sd)
return;

- /*
- * Unlock the current runqueue because we have to lock in
- * CPU order to avoid deadlocks. Caller knows that we might
- * unlock. We keep IRQs disabled.
- */
- spin_unlock(&this_rq->lock);
-
sibling_map = sd->span;

- for_each_cpu_mask(i, sibling_map)
- spin_lock(&cpu_rq(i)->lock);
/*
* We clear this CPU from the mask. This both simplifies the
* inner loop and keps this_rq locked when we exit:
*/
cpu_clear(this_cpu, sibling_map);
+
+ if (!trylock_smt_cpus(sibling_map))
+ return;

for_each_cpu_mask(i, sibling_map) {
runqueue_t *smt_rq = cpu_rq(i);
@@ -2703,11 +2718,10 @@ static int dependent_sleeper(int this_cp
* The same locking rules and details apply as for
* wake_sleeping_dependent():
*/
- spin_unlock(&this_rq->lock);
sibling_map = sd->span;
- for_each_cpu_mask(i, sibling_map)
- spin_lock(&cpu_rq(i)->lock);
cpu_clear(this_cpu, sibling_map);
+ if (!trylock_smt_cpus(sibling_map))
+ return 0;

/*
* Establish next task to be run - it might have gone away because
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/