Re: [PATCH] cpumask: convert cpumask_of_cpu() with cpumask_of()

From: Peter Zijlstra
Date: Thu May 26 2011 - 21:07:41 EST


On Wed, 2011-04-27 at 19:32 +0900, KOSAKI Motohiro wrote:
>
> I've made concept proof patch today. The result is better than I expected.
>
> <before>
> Performance counter stats for 'hackbench 10 thread 1000' (10 runs):
>
> 1603777813 cache-references # 56.987 M/sec ( +- 1.824% ) (scaled from 25.36%)
> 13780381 cache-misses # 0.490 M/sec ( +- 1.360% ) (scaled from 25.55%)
> 24872032348 L1-dcache-loads # 883.770 M/sec ( +- 0.666% ) (scaled from 25.51%)
> 640394580 L1-dcache-load-misses # 22.755 M/sec ( +- 0.796% ) (scaled from 25.47%)
>
> 14.162411769 seconds time elapsed ( +- 0.675% )
>
> <after>
> Performance counter stats for 'hackbench 10 thread 1000' (10 runs):
>
> 1416147603 cache-references # 51.566 M/sec ( +- 4.407% ) (scaled from 25.40%)
> 10920284 cache-misses # 0.398 M/sec ( +- 5.454% ) (scaled from 25.56%)
> 24666962632 L1-dcache-loads # 898.196 M/sec ( +- 1.747% ) (scaled from 25.54%)
> 598640329 L1-dcache-load-misses # 21.798 M/sec ( +- 2.504% ) (scaled from 25.50%)
>
> 13.812193312 seconds time elapsed ( +- 1.696% )
>
> * datail data is in result.txt
>
>
> The trick is,
> - Typical linux userland applications don't use mempolicy and/or cpusets
> API at all.
> - Then, 99.99% thread's tsk->cpus_alloed have cpu_all_mask.
> - cpu_all_mask case, every thread can share the same bitmap. It may help to
> reduce L1 cache miss in scheduler.
>
> What do you think?

Nice!

If you finish the first patch (sort the TODOs) I'll take it.

I'm unsure about the PF_THREAD_UNBOUND thing though, then again, the
alternative is adding another struct cpumask * and have that point to
the shared mask or the private mask.

But yeah, looks quite feasible.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/