[quad core results] BFS vs. mainline scheduler benchmarks andmeasurements

From: Ingo Molnar
Date: Mon Sep 07 2009 - 08:17:28 EST

* Frans Pop <elendil@xxxxxxxxx> wrote:

> Ingo Molnar wrote:
> > So the testbox i picked fits into the upper portion of what i
> > consider a sane range of systems to tune for - and should still fit
> > into BFS's design bracket as well according to your description:
> > it's a dual quad core system with hyperthreading.
> Ingo,
> Nice that you've looked into this.
> Would it be possible for you to run the same tests on e.g. a dual
> core and/or a UP system (or maybe just offline some CPUs?)? It
> would be very interesting to see whether BFS does better in the
> lower portion of the range, or if the differences you show between
> the two schedulers are consistent across the range.


Note that usually we can extrapolate ballpark-figure quad and dual
socket results from 8 core results. Trends as drastic as the ones
i reported do not get reversed as one shrinks the number of cores.

[ This technique is not universal - for example borderline graphs
on cannot be extrapolated down reliably - but the graphs i
posted were far from borderline. ]

Con posted single-socket quad comparisons/graphs so to make it 100%
apples to apples i re-tested with a single-socket (non-NUMA) quad as
well, and have uploaded the new graphs/results to:

kernel build performance on quad:

pipe performance on quad:

messaging performance (hackbench) on quad:

OLTP performance (postgresql + sysbench) on quad:

It shows similar curves and behavior to the 8-core results i posted
- BFS is slower than mainline in virtually every measurement. The
ratios are different for different parts of the graphs - but the
trend is similar.

I also re-ran a few standalone kernel latency tests with a single


BFS: TCP latency using localhost: 16.9926 microseconds
sched-devel: TCP latency using localhost: 12.4141 microseconds [36.8% faster]

as a comparison, the 8 core lat_tcp result was:

BFS: TCP latency using localhost: 16.5608 microseconds
sched-devel: TCP latency using localhost: 13.5528 microseconds [22.1% faster]

lat_pipe quad result:

BFS: Pipe latency: 4.6978 microseconds
sched-devel: Pipe latency: 2.6860 microseconds [74.8% faster]

as a comparison, the 8 core lat_pipe result was:

BFS: Pipe latency: 4.9703 microseconds
sched-devel: Pipe latency: 2.6137 microseconds [90.1% faster]

On the desktop interactivity front, i also still saw that bad
starvation artifact with BFS with multiple copies of CPU-bound
pipe-test-1m.c running in parallel:


Start up a few copies of them like this:

for ((i=0;i<32;i++)); do ./pipe-test-1m & done

and the quad eventually came to a halt here - until the tasks
finished running.

I also tested a few key data points on dual core and it shows
similar trends as well (as expected from the 8 and 4 core results).

But ... i'd really encourage everyone to test these things yourself
as well and not take anyone's word on this as granted. The more
people provide numbers, the better. The latest BFS patch can be
found at:


The mainline sched-devel tree can be found at:



To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/