Re: [patch 00/16] CFS Bandwidth Control v7
From: Hidetoshi Seto
Date: Fri Jun 24 2011 - 01:12:00 EST
(2011/06/23 21:43), Ingo Molnar wrote:
>
> * Peter Zijlstra <a.p.zijlstra@xxxxxxxxx> wrote:
>
>> On Wed, 2011-06-22 at 19:05 +0900, Hidetoshi Seto wrote:
>>
>>> I'll continue my test/benchmark on this v7 for a while. Though I
>>> believe no more bug is there, I'll let you know if there is
>>> something.
>>
>> Would that testing include performance of a kernel without these
>> patches vs one with these patches in a configuration where the new
>> feature is compiled in but not used?
>>
>> It does add a number of if (!cfs_rq->runtime_enabled) return
>> branches all over the place, some possibly inside a function call
>> (depending on what the auto-inliner does). So while the impact
>> should be minimal, it would be very good to test it is indeed so.
>
> Yeah, doing such performance tests is absolutely required. Branches
> and instructions impact should be measured as well, beyond the cycles
> impact.
>
> The changelog of this recent commit:
>
> c8b281161dfa: sched: Increase SCHED_LOAD_SCALE resolution
>
> gives an example of how to do such measurements.
Thank you for useful guidance!
I've run pipe-test-100k on both of a kernel without patches (3.0-rc4)
and one with patches (3.0-rc4+), in similar way as that described in
the change log you pointed (but I add "-d" for more details).
I sampled 4 results for each: repeat 10 times * 3 + repeat 200 times * 1.
Cgroups are not used in both, therefore of course CFS bandwidth control
is not used in one that have patched. Results are archived and attached.
Here is a comparison in diff style:
=====
--- /home/seto/bwc-pipe-test/bwc-rc4-orig.txt 2011-06-24 11:52:16.000000000 +0900
+++ /home/seto/bwc-pipe-test/bwc-rc4-patched.txt 2011-06-24 12:08:32.000000000 +0900
[seto@SIRIUS-F14 perf]$ taskset 1 ./perf stat -d -d -d --repeat 200 ../../../pipe-test-100k
Performance counter stats for '../../../pipe-test-100k' (200 runs):
- 865.139070 task-clock # 0.468 CPUs utilized ( +- 0.22% )
- 200,167 context-switches # 0.231 M/sec ( +- 0.00% )
- 0 CPU-migrations # 0.000 M/sec ( +- 49.62% )
- 142 page-faults # 0.000 M/sec ( +- 0.07% )
- 1,671,107,623 cycles # 1.932 GHz ( +- 0.16% ) [28.23%]
- 838,554,329 stalled-cycles-frontend # 50.18% frontend cycles idle ( +- 0.27% ) [28.21%]
- 453,526,560 stalled-cycles-backend # 27.14% backend cycles idle ( +- 0.43% ) [28.33%]
- 1,434,140,915 instructions # 0.86 insns per cycle
- # 0.58 stalled cycles per insn ( +- 0.06% ) [34.01%]
- 279,485,621 branches # 323.053 M/sec ( +- 0.06% ) [33.98%]
- 6,653,998 branch-misses # 2.38% of all branches ( +- 0.16% ) [33.93%]
- 495,463,378 L1-dcache-loads # 572.698 M/sec ( +- 0.05% ) [28.12%]
- 27,903,270 L1-dcache-load-misses # 5.63% of all L1-dcache hits ( +- 0.28% ) [27.84%]
- 885,210 LLC-loads # 1.023 M/sec ( +- 3.21% ) [21.80%]
- 9,479 LLC-load-misses # 1.07% of all LL-cache hits ( +- 0.63% ) [ 5.61%]
- 830,096,007 L1-icache-loads # 959.494 M/sec ( +- 0.08% ) [11.18%]
- 123,728,370 L1-icache-load-misses # 14.91% of all L1-icache hits ( +- 0.06% ) [16.78%]
- 504,932,490 dTLB-loads # 583.643 M/sec ( +- 0.06% ) [22.30%]
- 2,056,069 dTLB-load-misses # 0.41% of all dTLB cache hits ( +- 2.23% ) [22.20%]
- 1,579,410,083 iTLB-loads # 1825.614 M/sec ( +- 0.06% ) [22.30%]
- 394,739 iTLB-load-misses # 0.02% of all iTLB cache hits ( +- 0.03% ) [22.27%]
- 2,286,363 L1-dcache-prefetches # 2.643 M/sec ( +- 0.72% ) [22.40%]
- 776,096 L1-dcache-prefetch-misses # 0.897 M/sec ( +- 1.45% ) [22.54%]
+ 859.259725 task-clock # 0.472 CPUs utilized ( +- 0.24% )
+ 200,165 context-switches # 0.233 M/sec ( +- 0.00% )
+ 0 CPU-migrations # 0.000 M/sec ( +-100.00% )
+ 142 page-faults # 0.000 M/sec ( +- 0.06% )
+ 1,659,371,974 cycles # 1.931 GHz ( +- 0.18% ) [28.23%]
+ 829,806,955 stalled-cycles-frontend # 50.01% frontend cycles idle ( +- 0.32% ) [28.32%]
+ 490,316,435 stalled-cycles-backend # 29.55% backend cycles idle ( +- 0.46% ) [28.34%]
+ 1,445,166,061 instructions # 0.87 insns per cycle
+ # 0.57 stalled cycles per insn ( +- 0.06% ) [34.01%]
+ 282,370,988 branches # 328.621 M/sec ( +- 0.06% ) [33.93%]
+ 5,056,568 branch-misses # 1.79% of all branches ( +- 0.19% ) [33.94%]
+ 500,660,789 L1-dcache-loads # 582.665 M/sec ( +- 0.06% ) [28.05%]
+ 26,802,313 L1-dcache-load-misses # 5.35% of all L1-dcache hits ( +- 0.26% ) [27.83%]
+ 872,571 LLC-loads # 1.015 M/sec ( +- 3.73% ) [21.82%]
+ 9,050 LLC-load-misses # 1.04% of all LL-cache hits ( +- 0.55% ) [ 5.70%]
+ 794,396,111 L1-icache-loads # 924.512 M/sec ( +- 0.06% ) [11.30%]
+ 130,179,414 L1-icache-load-misses # 16.39% of all L1-icache hits ( +- 0.09% ) [16.85%]
+ 511,119,889 dTLB-loads # 594.837 M/sec ( +- 0.06% ) [22.37%]
+ 2,452,378 dTLB-load-misses # 0.48% of all dTLB cache hits ( +- 2.31% ) [22.14%]
+ 1,597,897,243 iTLB-loads # 1859.621 M/sec ( +- 0.06% ) [22.17%]
+ 394,366 iTLB-load-misses # 0.02% of all iTLB cache hits ( +- 0.03% ) [22.24%]
+ 1,897,401 L1-dcache-prefetches # 2.208 M/sec ( +- 0.64% ) [22.38%]
+ 879,391 L1-dcache-prefetch-misses # 1.023 M/sec ( +- 0.90% ) [22.54%]
- 1.847093132 seconds time elapsed ( +- 0.19% )
+ 1.822131534 seconds time elapsed ( +- 0.21% )
=====
As Peter have expected, the number of branches is slightly increased.
- 279,485,621 branches # 323.053 M/sec ( +- 0.06% ) [33.98%]
+ 282,370,988 branches # 328.621 M/sec ( +- 0.06% ) [33.93%]
However, looking overall, I think there is no significant problem on
the score with this patch set. I'd love to hear from maintainers.
Thanks,
H.Seto
Attachment:
bwc-pipe-test.tar.bz2
Description: Binary data