Re: [PATCH 1/1] perf tools: Add missing user space support for config1/config2

From: Stephane Eranian
Date: Fri Apr 22 2011 - 05:41:15 EST


On Fri, Apr 22, 2011 at 11:23 AM, Ingo Molnar <mingo@xxxxxxx> wrote:
>
> * Stephane Eranian <eranian@xxxxxxxxxx> wrote:
>
>> On Fri, Apr 22, 2011 at 10:06 AM, Ingo Molnar <mingo@xxxxxxx> wrote:
>> >
>> > * Ingo Molnar <mingo@xxxxxxx> wrote:
>> >
>> >> This needs to be a *lot* more user friendly. Users do not want to type in
>> >> stupid hexa magic numbers to get profiling. We have moved beyond the oprofile
>> >> era really.
>> >>
>> >> Unless there's proper generalized and human usable support i'm leaning
>> >> towards turning off the offcore user-space accessible raw bits for now, and
>> >> use them only kernel-internally, for the cache events.
>>
>> Generic cache events are a myth. They are not usable. I keep getting
>> questions from users because nobody knows what they are actually counting,
>> thus nobody knows how to interpret the counts. You cannot really hide the
>> micro-architecture if you want to make any sensible measurements.
>
> Well:
>
> Âaldebaran:~> perf stat --repeat 10 -e instructions -e L1-dcache-loads -e L1-dcache-load-misses -e LLC-misses ./hackbench 10
> ÂTime: 0.125
> ÂTime: 0.136
> ÂTime: 0.180
> ÂTime: 0.103
> ÂTime: 0.097
> ÂTime: 0.125
> ÂTime: 0.104
> ÂTime: 0.125
> ÂTime: 0.114
> ÂTime: 0.158
>
> ÂPerformance counter stats for './hackbench 10' (10 runs):
>
>   2,102,556,398 instructions       #   Â0.000 IPC   ( +-  1.179% )
>    843,957,634 L1-dcache-loads      Â( +-  1.295% )
>    130,007,361 L1-dcache-load-misses   Â( +-  3.281% )
>     6,328,938 LLC-misses         ( +-  3.969% )
>
>    Â0.146160287 Âseconds time elapsed  ( +-  5.851% )
>
> It's certainly useful if you want to get ballpark figures about cache behavior
> of an app and want to do comparisons.
>
What can you conclude from the above counts?
Are they good or bad? If they are bad, how do you go about fixing the app?

> There are inconsistencies in our generic cache events - but that's not really a
> reason to obcure their usage behind nonsensical microarchitecture-specific
> details.
>
The actual events are a reflection of the micro-architecture. They indirectly
describe how it works. It is not clear to me that you can really improve your
app without some exposure to the micro-architecture.

So if you want to have generic events, I am fine with this, but you should not
block access to actual events pretending they are useless. Some people are
certainly interested in using them and learning about the micro-architecture
of their processor.


> But i'm definitely in favor of making these generalized events more consistent
> across different CPU types. Can you list examples of inconsistencies that we
> should resolve? (and which you possibly consider impossible to resolve, right?)
>
To make generic events more uniform across processors, one would have to have
precise definitions as to what they are supposed to count. Once you
have that, then
we may have a better chance at finding consistent mappings for each processor.
I have not yet seen such definitions.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/