Re: [RFC][PATCH 3/9] perf: export registerred pmus via sysfs

From: Corey Ashford
Date: Mon May 10 2010 - 19:14:21 EST


On 5/10/2010 4:53 AM, Ingo Molnar wrote:
>
> * Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
>> On Mon, 2010-05-10 at 13:43 +0200, Ingo Molnar wrote:
>>>
>>> Yeah, we really want a mechanism like this in place instead of continuing with
>>> the somewhat ad-hoc extensions to the event enumeration space.
>>>
>>> One detail: i think we want one more level. Instead of:
>>>
>>> /sys/devices/system/node/nodeN/node_events
>>> node_events/event_source_id
>>> node_events/local_misses
>>> /local_hits
>>> /remote_misses
>>> /remote_hits
>>> /...
>>>
>>> We want the individual events to be a directory, containing the event_id:
>>>
>>> /sys/devices/system/node/nodeN/node_events
>>> node_events/event_source_id
>>> node_events/local_misses/event_id
>>> /local_hits/event_id
>>> /remote_misses/event_id
>>> /remote_hits/event_id
>>> /...
>>>
>>> The reason is that we want to keep our options open to add more attributes to
>>> individual events. (In fact extended attributes already exist for certain
>>> event classes - such as the 'format' info for tracepoints.)

Having extra fields for each event would allow us to describe hardware-specific event attributes. For example:
/sys/devices/system/node/nodeN/node_events
node_events/event_source_id
node_events/local_misses/event_id
/local_hits/event_id
/crypto_datamover <- specific node PMU
/marked_crb_rcv_des
/event_id
/attrib
/lpid <- attribute name
/lpid/type <- type of attribute (boolean, integer, etc.)
/lpid/min <- min value of int attribute
/lpid/max <- max value of int attribute
/lpid/bit_offset <- amount to shift attribute value before OR'ing into the raw event code
/marking_mode <- attribute name
/marking_mode/type
/...

Of course, these nodes would need to be replicated for each event that needs them or other attributes.


>>
>> Sure, sounds like a sensible suggestion.
>>
>> One thing I'd also like to clarify is that !raw events should not be
>> exhaustive hardware event lists, those are best left for userspace, but
>> instead are generally useful events that can be expected to be implemented
>> by any hardware of that particular class.

Why exactly is this? I got the impression this was something you and Ingo wanted earlier. As big of an impact as it will be, it would be nice to unify the two event spaces (generic and raw) into one space that can be explored by a user space tool (or even crudely by /bin/ls).

>>
>> So a GPU might have things like 'vsync' and 'cmd_pipeline_stall' or whatever
>> is a generic GPU feature, but not very implementation specific things that
>> the next generation of hardware won't ever have.
>
> Definitely so.
>
> Ingo

Hi Ingo,

In the past, you said that you didn't want to have user space anything enumerate raw hardware events that are supported by the kernel. So does the above represent a re-thinking of that position?

We'd like to have the capability of hardware-specific symbolic event names in the perf tool by some mechanism, unified or otherwise. Right now, for the IBM Wire-Speed processor, we are currently not able to use the perf tool because of its lack of symbolic raw event name support.

In the mean time, we are using a pair of demo programs from Stephane Eranian's libpfm4 source tree called "task" and "syst". These tools use the symbolic event names provided by libpfm4, and use the kernel support from perf_events.

Regards,

- Corey

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/