volano ~30% regression with 2.6.33-rc1 & -rc2

From: Lin Ming
Date: Mon Jan 04 2010 - 03:31:44 EST


Mike & Peter,

Compared with 2.6.32, volano has ~30% regression with 2.6.33-rc1 & -rc2.
Testing machine: Tigerton Xeon, 16cpus(4P/4Core), 16G memory

Bisect to below commit,

commit a1f84a3ab8e002159498814eaa7e48c33752b04b
Author: Mike Galbraith <efault@xxxxxx>
Date: Tue Oct 27 15:35:38 2009 +0100

sched: Check for an idle shared cache in select_task_rq_fair()

When waking affine, check for an idle shared cache, and if
found, wake to that CPU/sibling instead of the waker's CPU.

This improves pgsql+oltp ramp up by roughly 8%. Possibly more
for other loads, depending on overlap. The trade-off is a
roughly 1% peak downturn if tasks are truly synchronous.

Signed-off-by: Mike Galbraith <efault@xxxxxx>
Cc: Arjan van de Ven <arjan@xxxxxxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: <stable@xxxxxxxxxx>
LKML-Reference: <1256654138.17752.7.camel@xxxxxxxxxxxxxxxx>
Signed-off-by: Ingo Molnar <mingo@xxxxxxx>


This commit can't be reverted due to conflict, so I reverted below 4
commits related to idle-shared-cache in 2.6.33-rc2, and then the
performance was restored to 2.6.32.

fe3bcfe (sched: More generic WAKE_AFFINE vs select_idle_sibling())
a50bde5 (sched: Cleanup select_task_rq_fair())
fd21073 (sched: Fix affinity logic in select_task_rq_fair())
a1f84a3 (sched: Check for an idle shared cache in select_task_rq_fair())

This regression seems caused by cache misses of access to per cpu data.
(see below perf top cache-misses data for detail)

select_idle_sibling(...)
{
....
for_each_cpu_and(i, sched_domain_span(sd), &p->cpus_allowed) {
if (!cpu_rq(i)->cfs.nr_running) {
target = i;
break;
}
}
....
}

The performance can be restored to 2.6.32 as well if SD_PREFER_SIBLING
is not set, so select_idle_sibling will not be called.

perf top data as follow,

2.6.33-rc1 cache-misses data (note 11.8% select_task_rq_fair)
------------------------------------------------------------------------------------
PerfTop: 12262 irqs/sec kernel:90.6% [1000Hz cache-misses], (all, 16 CPUs)
------------------------------------------------------------------------------------

samples pcnt function DSO
_______ _____ _____________________________ ________________

18272.00 11.8% select_task_rq_fair [kernel.kallsyms]
15499.00 10.0% schedule [kernel.kallsyms]
9447.00 6.1% update_curr [kernel.kallsyms]
9255.00 6.0% _raw_spin_lock [kernel.kallsyms]
5161.00 3.3% tcp_sendmsg [kernel.kallsyms]

2.6.32 cache-misses data
--------------------------------------------------------------------------------------
PerfTop: 11749 irqs/sec kernel:88.2% [1000Hz cache-misses], (all, 16 CPUs)
--------------------------------------------------------------------------------------

samples pcnt function DSO
_______ _____ _____________________________ _________________
11974.00 11.5% schedule [kernel.kallsyms]
6656.00 6.4% _spin_lock [kernel.kallsyms]
5852.00 5.6% update_curr [kernel.kallsyms]
3140.00 3.0% enqueue_entity [kernel.kallsyms]
2846.00 2.7% tcp_sendmsg [kernel.kallsyms]

2.6.33-rc1 cycles data (note 6.5% select_task_rq_fair)
-------------------------------------------------------------------------------
PerfTop: 11106 irqs/sec kernel:99.7% [1000Hz cycles], (all, 16 CPUs)
-------------------------------------------------------------------------------

samples pcnt function DSO
_______ _____ _________________________ _________________

11658.00 10.0% schedule [kernel.kallsyms]
10870.00 9.4% _raw_spin_lock [kernel.kallsyms]
7576.00 6.5% select_task_rq_fair [kernel.kallsyms]
3696.00 3.2% tcp_sendmsg [kernel.kallsyms]
3000.00 2.6% update_curr [kernel.kallsyms]

2.6.32 cycles data
------------------------------------------------------------------------------------
PerfTop: 10462 irqs/sec kernel:99.8% [1000Hz cycles], (all, 16 CPUs)
------------------------------------------------------------------------------------

samples pcnt function DSO
_______ _____ _________________________ _________________

13364.00 9.9% schedule [kernel.kallsyms]
13140.00 9.8% _spin_lock [kernel.kallsyms]
4903.00 3.6% tcp_sendmsg [kernel.kallsyms]
4017.00 3.0% update_curr [kernel.kallsyms]
3395.00 2.5% _spin_lock_bh [kernel.kallsyms]


Lin Ming


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/