Re: [RFC] perf_events: support for uncore a.k.a. nest units

From: Corey Ashford
Date: Wed Jan 20 2010 - 18:23:55 EST




On 1/20/2010 1:33 PM, Peter Zijlstra wrote:
On Wed, 2010-01-20 at 14:34 +0100, Peter Zijlstra wrote:

So how about PERF_TYPE_{CORE,NODE,SOCKET} like things?

OK, so I read most of the intel uncore stuff, and it seems to suggest
you need a regular pmu event to receive uncore events (chained setup),
this seems rather retarded since it wastes a perfectly good pmu event
and makes configuring all this more intricate...

A well, nothing to be done about that I guess..

Yes, we have a similar situation where in addition to events that are counted on core PMU counters, we also have counters that are off-core; in some cases the counters are in off-core units which take their actual events from other off-core units, in addition to their own events. So you can see that this can be almost arbitrarily complex.

As for the PERF_TYPE_(CORE,NODE,SOCKET) idea, that could still work, even though, for example, a socket event may be counted on a core PMU. Using more encodings for the type field, as you've suggested, would allow us to reuse the 64-bit config space multiple times. Were you thinking that with the type field we'd still re-use the "cpu" argument for the actual pmu address within the PERF_TYPE_* space? If so, that's an interesting idea, but I think it still leaves open the problem of how to actually relate those address to the real hardware, especially in the case of using a hypervisor which has provided you a small subset of the physical hardware in the system.

I really think we need some sort of data structure which is passed from the kernel to user space to represent the topology of the system, and give useful information to be able to identify each PMU node. Whether this is done with a sysfs-style tree, a table in a file, XML, etc... it doesn't really matter much, but it needs to be something that can be parsed relatively easily and *contains just enough information* for the user to be able to correctly choose PMUs, and for the kernel to be able to relate that back to actual PMU hardware.

In our case, we are looking at /proc/device-tree, and it actually does appear to contain enough information for us. However, since /proc/device-tree is not available anywhere but Power arch (/proc/device-tree originates from a data structure passed into the OS from the Open Firmware) we'd like to have a more general approach that can be used on x86 and other arches.

- Corey

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/