Re: [PATCH v1 00/19] Increase resolution of load weights

From: Nikhil Rao
Date: Tue May 03 2011 - 21:14:09 EST


On Tue, May 3, 2011 at 5:58 PM, Nikhil Rao <ncrao@xxxxxxxxxx> wrote:
> On Sun, May 1, 2011 at 11:14 PM, Ingo Molnar <mingo@xxxxxxx> wrote:
>>
>> * Nikhil Rao <ncrao@xxxxxxxxxx> wrote:
>>
>>> 1. Performance costs
>>>
>>> Ran 50 iterations of Ingo's pipe-test-100k program (100k pipe ping-pongs).
>>> See http://thread.gmane.org/gmane.linux.kernel/1129232/focus=1129389 for more
>>> info.
>>>
>>> 64-bit build.
>>>
>>> Â 2.6.39-rc5 (baseline):
>>>
>>> Â Â Performance counter stats for './pipe-test-100k' (50 runs):
>>>
>>>    Â905,034,914 instructions       #   Â0.345 IPC   ( +-  0.016% )
>>>   Â2,623,924,516 cycles           ( +-  0.759% )
>>>
>>>     1.518543478 Âseconds time elapsed  ( +-  0.513% )
>>>
>>> Â 2.6.39-rc5 + patchset:
>>>
>>> Â Â Performance counter stats for './pipe-test-100k' (50 runs):
>>>
>>>    Â905,351,545 instructions       #   Â0.343 IPC   ( +-  0.018% )
>>>   Â2,638,939,777 cycles           ( +-  0.761% )
>>>
>>>     1.509101452 Âseconds time elapsed  ( +-  0.537% )
>>>
>>> There is a marginal increase in instruction retired, about 0.034%; and marginal
>>> increase in cycles counted, about 0.57%.
>>
>> Not sure this increase is statistically significant: both effects are within
>> noise and look at elapsed time, it actually went down.
>>
>> Btw., to best measure context-switching costs you should do something like:
>>
>> Âtaskset 1 perf stat --repeat 50 ./pipe-test-100k
>>
>> to pin both tasks to the same CPU. This reduces noise and makes the numbers
>> more relevant: SMP costs do not increase due to your patchset.
>>
>> So it would be nice to re-run the 64-bit tests with the pipe test bound to a
>> single CPU.
>
> I re-ran the 64-bit tests with the pipe test bound to a single CPU.
> Data attached below.
>
> 2.6.39-rc5:
>
> ÂPerformance counter stats for './pipe-test-100k' (100 runs):
>
>    855,571,900 instructions       #   Â0.869 IPC   ( +-  0.637% )
>    984,213,635 cycles           ( +-  0.254% )
>
>    Â0.796129773 Âseconds time elapsed  ( +-  0.152% )
>
> 2.6.39-rc5 Â+ patchset:
>
> ÂPerformance counter stats for './pipe-test-100k' (100 runs):
>
>    905,553,828 instructions       #   Â0.934 IPC   ( +-  0.059% )
>    969,792,787 cycles           ( +-  0.168% )
>
>    Â0.788676004 Âseconds time elapsed  ( +-  0.122% )
>
>
> There is a 5.8% increase in instructions which is statistically
> significant and well over the error margins. Cycles dropped by about
> 1.17% and elapsed time also dropped about ~1%. I'm looking into
> profiles for this test to understand why instr has increased.
>
>>
>>> 32-bit build.
>>>
>>> Â 2.6.39-rc5 (baseline):
>>>
>>> Â Â Performance counter stats for './pipe-test-100k' (50 runs):
>>>
>>>   Â1,025,151,722 instructions       #   Â0.238 IPC   ( +-  0.018% )
>>>   Â4,303,226,625 cycles           ( +-  0.524% )
>>>
>>>     2.133056844 Âseconds time elapsed  ( +-  0.619% )
>>>
>>> Â 2.6.39-rc5 + patchset:
>>>
>>> Â Â Performance counter stats for './pipe-test-100k' (50 runs):
>>>
>>>   Â1,070,610,068 instructions       #   Â0.239 IPC   ( +-  1.369% )
>>>   Â4,478,912,974 cycles           ( +-  1.011% )
>>>
>>>     2.293382242 Âseconds time elapsed  ( +-  0.144% )
>>>
>>> On 32-bit kernels, instructions retired increases by about 4.4% with this
>>> patchset. CPU cycles also increases by about 4%.
>>>
>>> There is a marginal increase in instruction retired, about 0.034%; and
>>> marginal increase in cycles counted, about 0.57%.
>>
>> These results look more bothersome, a clear increase in both cycles, elapsed
>> time, and instructions retired, well beyond measurement noise.
>>
>> Given that scheduling costs are roughly 30% of that pipe test-case, the cost
>> increase to the scheduler is probably around:
>>
>> Â Â Â Âinstructions: Â +14.5%
>> Â Â Â Âcycles: Â Â Â Â +13.3%
>>
>> That is rather significant.
>>
>
> I'll take a closer look at the performance of this patchset this week.
> I'm a little confused about how you calculated the cost to the
> scheduler. How did you come up with 14.5 % and 13.3%?

Ah, never mind that. After reading your mail again, I see how this is
calculated now.

Also, out of
> curiosity, what's an acceptable tolerance level for a performance hit
> on 32-bit?
>
> -Thanks
> Nikhil
>
>> Thanks,
>>
>> Â Â Â ÂIngo
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/