Re: [GIT PULL] perf changes for v3.8

From: David Ahern
Date: Sun Dec 16 2012 - 23:45:11 EST


On 12/13/12 10:31 AM, Ingo Molnar wrote:
* Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
So the default shouldn't necessarily be "include guest". The default
should presumably be "the user didn't say", and then the kernel does
whatever works best.

If the user actually explicitly says one or the other, we should try
to honor that (and then EOPNOTSUPP may be a "sorry, I really cannot do
that particular combination that you explicitly asked for").

That should make everybody happy. Doing a non-PEBS virtualized perf
run should still work with the old binary.

So there should be two bits: "include guest" (V in the event specifier
unless you already used that for something else) and "host only" (H),
and they should both default to off. Then the kernel can see the three
actual cases.

(Or four cases, if you really want to: you may or may not want to make
the "both V and H set means both, and _only_ V set means 'no host at
all, _only_ virtual environment'. So then ":ppV" would mean
"cycle-accurate for virtual box _only_", while ":ppVH" would mean
"cycle-accurate for both the host and the virtual box". Of course,
considering the PEBS interface, right now neither of those can
actually work, but plain ":V" and ":HV" could work).

The important thing, I think, is that if the user doesn't know
or care about the VM case (because he's not running any!) and
doesn't specify, then the kernel should not say EOPNOTSUPP,
and should do whatever works for that cpu.

Agreed.

David, wanna send a patch for this?

As I mentioned in a prior email exclude_{guest,host} work currently work fine without PEBS. The current matrix for the flags:

profiling
guest host
-e <event> y y
-e <event>:G y n - G means enable guest, turn off host
-e <event>:H n y - H means enable host, turn off guest
-e <event>:GH y y - G followed by H means enable both
-e <event>:HG y y - same as GH

There is no reason to change how these work. It's the variants with :p that need to be handled:

-e <event>:p n y - guest off is required
-e <event>:pG y n - needs to fail - not supported
-e <event>:pH n y
-e <event>:pGH y y - needs to fail - not supported

This is the logic that was implemented in the original patchset which was pulled into v3.7 and the cause of this email thread.

One suggestion was to switch exclude_guest to include_guest. I take that to mean deprecate the current exclude_guest and add a new include_guest flag. Given that there are a number of exclude_XXXX flags (XXXX = user, kernel, host, guest, hv, etc) that would make the perf code inconsistent.

All that is needed is for the current exclude_guest flag to be deprecated such that for older binaries on newer kernels it is ignored (perhaps a warn on once), and then a new flag -- exclude_guest2 -- is then used for the new logic.

e.g.,

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 4f63c05..19900df 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -266,12 +266,14 @@ struct perf_event_attr {
sample_id_all : 1, /* sample_type all events */

exclude_host : 1, /* don't count in host */
- exclude_guest : 1, /* don't count in guest */
+ exclude_guest : 1, /* don't count in guest - DEPRECATED */

exclude_callchain_kernel : 1, /* exclude kernel callchains */
exclude_callchain_user : 1, /* exclude user callchains */

- __reserved_1 : 41;
+ exclude_guest2 : 1, /* don't count in guest */
+
+ __reserved_1 : 40;

union {
__u32 wakeup_events; /* wakeup every n events */


Do you agree with that?

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/