Re: [PATCH v1 00/19] Increase resolution of load weights

From: Nikhil Rao
Date: Tue May 03 2011 - 20:59:24 EST

Next message: john stultz: "Re: Long timeout when booting >= 2.6.38"
Previous message: Vaibhav Nagarnaik: "[PATCH] trace: Use NUMA allocation for per-cpu ring buffer pages"
In reply to: Ingo Molnar: "Re: [PATCH v1 00/19] Increase resolution of load weights"
Next in thread: Nikhil Rao: "Re: [PATCH v1 00/19] Increase resolution of load weights"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Sun, May 1, 2011 at 11:14 PM, Ingo Molnar <mingo@xxxxxxx> wrote:
>
> * Nikhil Rao <ncrao@xxxxxxxxxx> wrote:
>
>> 1. Performance costs
>>
>> Ran 50 iterations of Ingo's pipe-test-100k program (100k pipe ping-pongs).
>> See http://thread.gmane.org/gmane.linux.kernel/1129232/focus=1129389 for more
>> info.
>>
>> 64-bit build.
>>
>> Â 2.6.39-rc5 (baseline):
>>
>> Â Â Performance counter stats for './pipe-test-100k' (50 runs):
>>
>> Â Â Â Â905,034,914 instructions Â Â Â Â Â Â # Â Â Â0.345 IPC Â Â ( +- Â 0.016% )
>> Â Â Â2,623,924,516 cycles Â Â Â Â Â Â Â Â Â Â ( +- Â 0.759% )
>>
>> Â Â Â Â 1.518543478 Âseconds time elapsed Â ( +- Â 0.513% )
>>
>> Â 2.6.39-rc5 + patchset:
>>
>> Â Â Performance counter stats for './pipe-test-100k' (50 runs):
>>
>> Â Â Â Â905,351,545 instructions Â Â Â Â Â Â # Â Â Â0.343 IPC Â Â ( +- Â 0.018% )
>> Â Â Â2,638,939,777 cycles Â Â Â Â Â Â Â Â Â Â ( +- Â 0.761% )
>>
>> Â Â Â Â 1.509101452 Âseconds time elapsed Â ( +- Â 0.537% )
>>
>> There is a marginal increase in instruction retired, about 0.034%; and marginal
>> increase in cycles counted, about 0.57%.
>
> Not sure this increase is statistically significant: both effects are within
> noise and look at elapsed time, it actually went down.
>
> Btw., to best measure context-switching costs you should do something like:
>
> Âtaskset 1 perf stat --repeat 50 ./pipe-test-100k
>
> to pin both tasks to the same CPU. This reduces noise and makes the numbers
> more relevant: SMP costs do not increase due to your patchset.
>
> So it would be nice to re-run the 64-bit tests with the pipe test bound to a
> single CPU.

I re-ran the 64-bit tests with the pipe test bound to a single CPU.
Data attached below.

2.6.39-rc5:

Performance counter stats for './pipe-test-100k' (100 runs):

855,571,900 instructions # 0.869 IPC ( +- 0.637% )
984,213,635 cycles ( +- 0.254% )

0.796129773 seconds time elapsed ( +- 0.152% )

2.6.39-rc5 + patchset:

Performance counter stats for './pipe-test-100k' (100 runs):

905,553,828 instructions # 0.934 IPC ( +- 0.059% )
969,792,787 cycles ( +- 0.168% )

0.788676004 seconds time elapsed ( +- 0.122% )

There is a 5.8% increase in instructions which is statistically
significant and well over the error margins. Cycles dropped by about
1.17% and elapsed time also dropped about ~1%. I'm looking into
profiles for this test to understand why instr has increased.

>
>> 32-bit build.
>>
>> Â 2.6.39-rc5 (baseline):
>>
>> Â Â Performance counter stats for './pipe-test-100k' (50 runs):
>>
>> Â Â Â1,025,151,722 instructions Â Â Â Â Â Â # Â Â Â0.238 IPC Â Â ( +- Â 0.018% )
>> Â Â Â4,303,226,625 cycles Â Â Â Â Â Â Â Â Â Â ( +- Â 0.524% )
>>
>> Â Â Â Â 2.133056844 Âseconds time elapsed Â ( +- Â 0.619% )
>>
>> Â 2.6.39-rc5 + patchset:
>>
>> Â Â Performance counter stats for './pipe-test-100k' (50 runs):
>>
>> Â Â Â1,070,610,068 instructions Â Â Â Â Â Â # Â Â Â0.239 IPC Â Â ( +- Â 1.369% )
>> Â Â Â4,478,912,974 cycles Â Â Â Â Â Â Â Â Â Â ( +- Â 1.011% )
>>
>> Â Â Â Â 2.293382242 Âseconds time elapsed Â ( +- Â 0.144% )
>>
>> On 32-bit kernels, instructions retired increases by about 4.4% with this
>> patchset. CPU cycles also increases by about 4%.
>>
>> There is a marginal increase in instruction retired, about 0.034%; and
>> marginal increase in cycles counted, about 0.57%.
>
> These results look more bothersome, a clear increase in both cycles, elapsed
> time, and instructions retired, well beyond measurement noise.
>
> Given that scheduling costs are roughly 30% of that pipe test-case, the cost
> increase to the scheduler is probably around:
>
> Â Â Â Âinstructions: Â +14.5%
> Â Â Â Âcycles: Â Â Â Â +13.3%
>
> That is rather significant.
>

I'll take a closer look at the performance of this patchset this week.
I'm a little confused about how you calculated the cost to the
scheduler. How did you come up with 14.5 % and 13.3%? Also, out of
curiosity, what's an acceptable tolerance level for a performance hit
on 32-bit?

-Thanks
Nikhil

> Thanks,
>
> Â Â Â ÂIngo
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: john stultz: "Re: Long timeout when booting >= 2.6.38"
Previous message: Vaibhav Nagarnaik: "[PATCH] trace: Use NUMA allocation for per-cpu ring buffer pages"
In reply to: Ingo Molnar: "Re: [PATCH v1 00/19] Increase resolution of load weights"
Next in thread: Nikhil Rao: "Re: [PATCH v1 00/19] Increase resolution of load weights"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]