Re: [PATCH 5/6] perf_counter: add more context information

From: Corey Ashford
Date: Mon Apr 06 2009 - 16:16:47 EST


Peter Zijlstra wrote:
On Mon, 2009-04-06 at 11:53 -0700, Corey Ashford wrote:
Peter Zijlstra wrote:
On Mon, 2009-04-06 at 13:01 +0200, Peter Zijlstra wrote:
On Fri, 2009-04-03 at 11:25 -0700, Corey Ashford wrote:
Peter Zijlstra wrote:
On Thu, 2009-04-02 at 11:12 +0200, Peter Zijlstra wrote:
plain text document attachment (perf_counter_callchain_context.patch)
Put in counts to tell which ips belong to what context.

-----
| | hv
| --
nr | | kernel
| --
| | user
-----
Right, just realized that PERF_RECORD_IP needs something similar if one
if not able to derive the context from the IP itself..

Three individual bits would suffice, or you could use a two-bit code -
00 = user
01 = kernel
10 = hypervisor
11 = reserved (or perhaps unknown)

Unfortunately, because of alignment, it would need to take up another 64 bit word, wouldn't it? Too bad you cannot sneak the bits into the IP in a machine independent way.

And since you probably need a separate word, that effectively doubles the amount of space taken up by IP samples (if we add a "no event header" option). Should we add another bit in the record_type field - PERF_RECORD_IP_LEVEL (or similar) so that user-space apps don't have to get this if they don't need it?
If we limit the event size to 64k (surely enough, right? :-), then we
have 16 more bits to play with in the header, and we could do something
like the below.

A further possibility would also be to add an overflow bit in there,
making the full 32bit PERF_RECORD space available to output events as
well.

Index: linux-2.6/include/linux/perf_counter.h
===================================================================
--- linux-2.6.orig/include/linux/perf_counter.h
+++ linux-2.6/include/linux/perf_counter.h
@@ -201,9 +201,17 @@ struct perf_counter_mmap_page {
__u32 data_head; /* head in the data section */
};
+enum {
+ PERF_EVENT_LEVEL_HV = 0,
+ PERF_EVENT_LEVEL_KERNEL = 1,
+ PERF_EVENT_LEVEL_USER = 2,
+};
+
struct perf_event_header {
__u32 type;
- __u32 size;
+ __u16 level : 2,
+ __reserved : 14;
+ __u16 size;
};
Except we should probably use masks again instead of bitfields so that
the thing is portable when streamed to disk, such as would be common
with splice().
One downside of this approach is that you if you specify "no header" (currently not possible, but maybe later?), you will not be able to get the level bits.

Would this be desirable? I know we've mentioned it before, but it would
mean one cannot mix various event types (currently that means !mmap and
callchain with difficulty).

I think it would. For one use case I'm working on right now, simple profiling, all I need are ip's. If I could omit the header, that would reduce the frequency of sigio's by a factor of three, and make it faster to read up the ip's when the SIGIO's occur.

I realize that it makes it impossible to mix record types with the header removed, and skipping over the call chain data a bit more difficult (but not rocket science).

It could be made an error for the caller to specify both "no header" and perf_coiunter_hw_event.mmap|munmap



As long as we mandate this header, we can have 16 misc bits.


True.

- Corey

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/