Re: [PATCH] cpumask: convert cpumask_of_cpu() with cpumask_of()

From: KOSAKI Motohiro
Date: Wed Apr 27 2011 - 06:32:29 EST


> > But why? Are we going to get rid of cpumask_t (which is a fixed sized
> > struct to direct assignment is perfectly fine)?
> >
> > Also, do we want to convert cpus_allowed to cpumask_var_t? It would save
> > quite a lot of memory on distro configs that set NR_CPUS silly high.
> > Currently NR_CPUS=4096 configs allocate 512 bytes per task for this
> > bitmap, 511 of which will never be used on most machines (510 in the
> > near future).
> >
> > The cost if of course an extra memory dereference in scheduler hot
> > paths.. also not nice.

Probably, mesurement data is verbose than my poor english...

I've made concept proof patch today. The result is better than I expected.

<before>
Performance counter stats for 'hackbench 10 thread 1000' (10 runs):

1603777813 cache-references # 56.987 M/sec ( +- 1.824% ) (scaled from 25.36%)
13780381 cache-misses # 0.490 M/sec ( +- 1.360% ) (scaled from 25.55%)
24872032348 L1-dcache-loads # 883.770 M/sec ( +- 0.666% ) (scaled from 25.51%)
640394580 L1-dcache-load-misses # 22.755 M/sec ( +- 0.796% ) (scaled from 25.47%)

14.162411769 seconds time elapsed ( +- 0.675% )

<after>
Performance counter stats for 'hackbench 10 thread 1000' (10 runs):

1416147603 cache-references # 51.566 M/sec ( +- 4.407% ) (scaled from 25.40%)
10920284 cache-misses # 0.398 M/sec ( +- 5.454% ) (scaled from 25.56%)
24666962632 L1-dcache-loads # 898.196 M/sec ( +- 1.747% ) (scaled from 25.54%)
598640329 L1-dcache-load-misses # 21.798 M/sec ( +- 2.504% ) (scaled from 25.50%)

13.812193312 seconds time elapsed ( +- 1.696% )

* datail data is in result.txt


The trick is,
- Typical linux userland applications don't use mempolicy and/or cpusets
API at all.
- Then, 99.99% thread's tsk->cpus_alloed have cpu_all_mask.
- cpu_all_mask case, every thread can share the same bitmap. It may help to
reduce L1 cache miss in scheduler.

What do you think?

Attachment: result.txt
Description: Binary data

Attachment: result.txt
Description: Binary data

Attachment: 0001-s-task-cpus_allowed-tsk_cpus_allowed.patch
Description: Binary data

Attachment: 0002-change-task-cpus_allowed-to-pointer.patch
Description: Binary data