Re: [RFC][PATCH 00/18] Increase resolution of load weights

From: Nikhil Rao
Date: Tue Apr 26 2011 - 12:11:53 EST


On Wed, Apr 20, 2011 at 11:16 PM, Ingo Molnar <mingo@xxxxxxx> wrote:
>
> * Nikhil Rao <ncrao@xxxxxxxxxx> wrote:
>
>> Major TODOs:
>> - Detect overflow in update shares calculations (time * load), and set load_avg
>> Â to maximum possible value (~0ULL).
>> - tg->task_weight uses an atomic which needs to be updates to 64-bit on 32-bit
>> Â machines. Might need to add a lock to protect this instead of atomic ops.
>> - Check wake-affine math and effective load calculations for overflows.
>> - Needs more testing and need to ensure fairness/balancing is not broken.
>
> Please measure micro-costs accurately as well, via perf stat --repeat 10 or so.
>
> For example, on a testsystem doing 200k pipe triggered context switches (100k
> pipe ping-pongs) costs this much:
>
> Â$ taskset 1 perf stat --repeat 10 ./pipe-test-100k
>
>    Â630.908390 task-clock-msecs     #   Â0.434 CPUs  Â( +-  0.499% )
>      200,001 context-switches     #   Â0.317 M/sec  ( +-  0.000% )
>         0 CPU-migrations      #   Â0.000 M/sec  ( +- Â66.667% )
>        145 page-faults       Â#   Â0.000 M/sec  ( +-  0.253% )
>   1,374,978,900 cycles          #  2179.364 M/sec  ( +-  0.516% )
>   1,373,646,429 instructions       #   Â0.999 IPC   ( +-  0.134% )
>    264,223,224 branches         #  Â418.798 M/sec  ( +-  0.134% )
>    Â16,613,988 branch-misses      Â#   Â6.288 %    ( +-  0.755% )
>      204,162 cache-references     #   Â0.324 M/sec  ( +- Â18.805% )
>       5,152 cache-misses       #   Â0.008 M/sec  ( +- Â21.280% )
>
> We want to know the delta in the 'instructions' value resulting from the patch
> (this can be measured very accurately) and we also want to see the 'cycles'
> effect - both can be measured pretty accurately.
>
> I've attached the testcase - you might need to increase the --repeat value so
> that noise drops below the level of the effect from these patches. (the effect
> is likely in the 0.01% range)
>

Thanks for the test program. Sorry for the delay in getting back to
you with results. I had some trouble wrangling machines :-(

I have data from pipe_test_100k on 32-bit builds below. I ran this
test 5000 times on each kernel with the two events (instructions,
cycles) configured (the test machine does not have enough PMUs to
measure all events without scaling).

taskset 1 perf stat --repeat 5000 -e instructions,cycles ./pipe-test-100k

baseline (v2.6.39-rc4):

Performance counter stats for './pipe-test-100k' (5000 runs):

994,061,050 instructions # 0.412 IPC ( +- 0.133% )
2,414,463,154 cycles ( +- 0.056% )

2.251820874 seconds time elapsed ( +- 0.429% )

kernel + patch:

Performance counter stats for './pipe-test-100k' (5000 runs):

1,064,610,666 instructions # 0.435 IPC ( +- 0.086% )
2,448,568,573 cycles ( +- 0.037% )

1.704553841 seconds time elapsed ( +- 0.288% )

We see a ~7.1% increase in instructions executed and a 1.4% increase
in cycles. We also see a 5.5% increase in IPC (understandable since we
do more work). I can't explain how elapsed time drops by about 0.5s
though.

> It would also be nice to see how 'size vmlinux' changes with these patches
> applied, on a 'make defconfig' build.
>

With a defconfig build, we see a marginal increase in vmlinux text
size (3049 bytes, 0.043%), and a small decreased in data size (-4040
bytes, -0.57%).

baseline (v2.6.39-rc4):
text data bss dec hex filename
7025688 711604 1875968 9613260 92afcc vmlinux-2.6.39-rc4

kernel + patch:
text data bss dec hex filename
7028737 707564 1875968 9612269 92abed vmlinux

-Thanks
Nikhil

> Thanks,
>
> Â Â Â ÂIngo
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/