Re: [PATCH 0/9] perf: Adding better precise_ip field handling

From: Stephane Eranian
Date: Wed May 15 2013 - 09:27:18 EST


On Mon, May 13, 2013 at 9:43 PM, Ingo Molnar <mingo@xxxxxxxxxx> wrote:
>
> * Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
>> On Sat, May 11, 2013 at 09:50:08AM +0200, Ingo Molnar wrote:
>> > That's really a red herring: there's absolutely no reason why the
>> > kernel could not pass back the level of precision it provided.
>>
>> All I've been saying is that doing random precision without feedback is
>> confusing.
>
> I agree with that.
>
>> We also don't really have a good feedback channel for this kind of
>> thing. The best I can come up with is tagging each and every sample with
>> the quality it represents. I think we can do with only one extra
>> PERF_RECORD_MISC bit, but it looks like we're quickly running out of
>> those things.
>
> Hm, how about passing precision back to user-space at creation time, in
> the perf_attr data structure? There's no need to pass it back in every
> sample, precision will not really change during the life-time of an event.
>
>> But I think the biggest problem is PEBS's inability do deal with REP
>> prefixes; see this email from Stephane:
>> https://lkml.org/lkml/2011/2/1/177
>>
>> It is really unfortunate for PEBS to have such a side-effect; but it
>> makes all memset/memcpy/memmove things appear like they have no cost.
>> I'm very sure that will surprise a number of people.
>
> I'd expect PEBS to get gradually better.
>
> Note that at least for user-space, REP MOVS is getting rarer. libc uses
> SSE based memcpy/memset variants - which is not miscounted by PEBS. The
> kernel still uses REP MOVS - but it's a special case because it cannot
> cheaply use vector registers.
>
> The vast majority of code gets measured by cycles:pp more accurately than
> cycles.
>
I don't understand how you come to that conclusion. I can show you simple
examples where this is not true at all (even without rep mov).

I will repeat once again what PEBS provides. The only guarantee of PEBS
is that it captures the next dynamic address after an instruction that caused
the event to occur.

The address is the 'next' one because PEBS captures at retirement of the sampled
instruction. The caveats are that the sampled instruction is not the
one at the end
of the sampling period for this event. It may be N cycles later.
Therefore there is a
shadow during which qualifying instructions may be executed but never sampled.
This is what INST_RETIRED:PREC_DIST is trying to compensate for.
Furthermore, as pointed out recently, the filters are ignored for the sampled
instruction. They are honored up to the sampling period. As such, the sampled
instruction does not qualify for the filters. Filters can vastly
change the meaning
of an event: for instance cmask=1 changes LLC_MISS to cycles with
pending LLC misses.

I would add that using uops_retired:cmask=16:invert to gain precise cycle
does not behave the same way across processors. It turns out that Westmere and
SandyBridge handle this differently during halted cycles. So in the
end, I think it is
pretty hard to understand what's being measured uniformly. And
therefore I think it
is a VERY bad idea to default cycles to cycles:pp when PEBS is present.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/