Re: [PATCH 06/14] perf, x86: PEBS infrastructure

From: Peter Zijlstra
Date: Fri Mar 05 2010 - 04:20:25 EST


On Fri, 2010-03-05 at 17:19 +1100, Paul Mackerras wrote:
> On Thu, Mar 04, 2010 at 03:00:52PM +0100, Peter Zijlstra wrote:
>
> > Implement a simple PEBS model that always takes a single PEBS event at
> > a time. This is done so that the interaction with the rest of the
> > system is as expected (freq adjust, period randomization, lbr).
> >
> > Signed-off-by: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
> > LKML-Reference: <new-submission>
> > ---
>
> ...
>
> > @@ -203,8 +203,9 @@ struct perf_event_attr {
> > enable_on_exec : 1, /* next exec enables */
> > task : 1, /* trace fork/exit */
> > watermark : 1, /* wakeup_watermark */
> > + precise : 1, /* OoO invariant counter */
>
> Could you explain in a bit more detail what this means?
>
> Also, it would be good to mention the ABI addition in the patch
> description, and explain it briefly there.

Quite so, my bad.

So on Intel regular PMIs can happen several instructions later than the
actual event due to out-of-order processing of the instruction stream,
that is, it doesn't keep the IP of the actual instruction that triggered
the event, so all we have is the IP of where the interrupt happened (the
difference between these IPs is called skid).

Now Intel came up with something called Precise Event Based Sampling
(PEBS) which stores a (partial) register set in some memory buffer at
event time (trap like for some daft reason).

So from that we can obtain the IP of the instruction _after_ the
instruction that caused the event. This is reliably so (mostly [*]) and
does not contain out-of-order artifacts (0-skid).

So the ->precise flag tells us to use a more precise sampling method if
available on the hardware (AMD could be using IBS to implement this for
their instruction counter).

If you look at patch 9/14 you'll see we use the Last Branch Recording
(LBR) facility of the Intel cpus (patch 8/14) to find the last basic
block in the instruction stream and use that to rewind the instruction
stream to get the actual instruction that triggered the event. In case
that works I also set PERF_RECORD_MISC_EXACT to indicate we got the IP
dead on (mostly [*]).

I suspect CPUs that are strictly in-order, like Atom, might always have
it right, but I need to validate that.

Does that clarify stuff?

[*] there are CPU errata that may delay the PEBS recording, mostly with
instructions like MOV SS, STI and things like SMM.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/