Re: [announce] Performance Counters for Linux, v6

From: Corey Ashford
Date: Mon Jan 26 2009 - 14:14:15 EST


Ingo Molnar wrote:
* stephane eranian <eranian@xxxxxxxxxxxxxx> wrote:

Hi,

Corey brings up an interesting problem which I wanted to comment on.

The current proposal hinges on the idea that by interpreting a single value the kernel can understand what the user wants to measure. For instance, if I pass type=0, then the kernel understands I want to measure CPU_CYCLES. Given that the number of events and their unit mask combinations can be large, the proposal also provides a "raw" mode, where the content of the type field is interpreted as the raw value to put into a register.

This is where there is an issue because with several PMU models, including on X86, using the raw bit + 64 value is not enough to figure out what the user wants to measure. This happens when the PMU has more than counters. Thus, interpreting each raw value has the event code may be wrong. To remain on familiar territory, the Nehalem uncore PMU has an opcode matcher register, that uses a 64-bit value. On AMD64 Family 10h, you have IBS. But I could give examples on Itanium with opcode matchers, range restrictions. Corey provided other examples for Power. The API has to provide a way to express what the raw value is meant for: counter, matcher, filter...

this can be done in a number of ways (in order of increasing levels of abstraction):

- the raw type is kept wide enough. Paul already requested the raw type
to be widened to 128 bits to express certain PowerPC features.

- or the PMU capability is expressed as a special counter type (if it's
useful enough) - and then either the write() method or ioctl is extended
to express attributes we want to set/change while a counter is running.

- or the highest level counter / hw event data type is extended with new
attribute field(s).

My feeling is that we generally want such hw features to start small - i.e. at the raw type level initially. Then we can allow them to climb the ladder, if they prove their utility in practice. We've got space reserved in the ABI to allow for growth like this.

Ingo


Hi Ingo and Stephane,

Thanks for the replies.

I think any one of those solutions would work for Power's Instruction Matching Register. If more than one register needs to be programmed, or the values don't fit into the 128-bit raw event types, we could use the "special counter" approach, I think.

I will have another look at the Power PMU description and see if there are other constraints that might cause us to want to go one way or the other, or perhaps a different way.

Regards,

- Corey

Corey Ashford
Software Engineer
IBM Linux Technology Center, Linux Toolchain
Beaverton, OR
503-578-3507
cjashfor@xxxxxxxxxx


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/