On Mon, 2009-04-06 at 13:01 +0200, Peter Zijlstra wrote:On Fri, 2009-04-03 at 11:25 -0700, Corey Ashford wrote:Peter Zijlstra wrote:If we limit the event size to 64k (surely enough, right? :-), then weOn Thu, 2009-04-02 at 11:12 +0200, Peter Zijlstra wrote:Three individual bits would suffice, or you could use a two-bit code -plain text document attachment (perf_counter_callchain_context.patch)Right, just realized that PERF_RECORD_IP needs something similar if one
Put in counts to tell which ips belong to what context.
-----
| | hv
| --
nr | | kernel
| --
| | user
-----
if not able to derive the context from the IP itself..
00 = user
01 = kernel
10 = hypervisor
11 = reserved (or perhaps unknown)
Unfortunately, because of alignment, it would need to take up another 64 bit word, wouldn't it? Too bad you cannot sneak the bits into the IP in a machine independent way.
And since you probably need a separate word, that effectively doubles the amount of space taken up by IP samples (if we add a "no event header" option). Should we add another bit in the record_type field - PERF_RECORD_IP_LEVEL (or similar) so that user-space apps don't have to get this if they don't need it?
have 16 more bits to play with in the header, and we could do something
like the below.
A further possibility would also be to add an overflow bit in there,
making the full 32bit PERF_RECORD space available to output events as
well.
Index: linux-2.6/include/linux/perf_counter.h
===================================================================
--- linux-2.6.orig/include/linux/perf_counter.h
+++ linux-2.6/include/linux/perf_counter.h
@@ -201,9 +201,17 @@ struct perf_counter_mmap_page {
__u32 data_head; /* head in the data section */
};
+enum {
+ PERF_EVENT_LEVEL_HV = 0,
+ PERF_EVENT_LEVEL_KERNEL = 1,
+ PERF_EVENT_LEVEL_USER = 2,
+};
+
struct perf_event_header {
__u32 type;
- __u32 size;
+ __u16 level : 2,
+ __reserved : 14;
+ __u16 size;
};
Except we should probably use masks again instead of bitfields so that
the thing is portable when streamed to disk, such as would be common
with splice().