[PATCH 1/5] perf: Deny optimized switch for events read by PERF_SAMPLE_READ

From: Jiri Olsa
Date: Fri Aug 22 2014 - 09:07:36 EST


The optimized task context switch for cloned perf events just
swaps whole perf event contexts (of current and next process)
if it finds them suitable. Events from the 'current' context
will now measure data of the 'next' context and vice versa.

This is ok for cases where we are not directly interested in
the event->count value of separate child events, like:
- standard sampling, where we take 'period' value for the
event count
- counting, where we accumulate all events (children)
into a single count value

But in case we read event by using the PERF_SAMPLE_READ sample
type, we are interested in direct event->count value meassured
in specific task. Switching events within tasks for this kind
of measurements corrupts data.

Fixing this by setting/unsetting pin_count for perf event
context once event with PERF_SAMPLE_READ read is added/removed.
The pin_count value != 0 makes the context not suitable for
optimized switch.

Cc: Andi Kleen <andi@xxxxxxxxxxxxxx>
Cc: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
Cc: Corey Ashford <cjashfor@xxxxxxxxxxxxxxxxxx>
Cc: David Ahern <dsahern@xxxxxxxxx>
Cc: Frederic Weisbecker <fweisbec@xxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: Jen-Cheng(Tommy) Huang <tommy24@xxxxxxxxxx>
Cc: Namhyung Kim <namhyung@xxxxxxxxxx>
Cc: Paul Mackerras <paulus@xxxxxxxxx>
Cc: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
Cc: Stephane Eranian <eranian@xxxxxxxxxx>
Signed-off-by: Jiri Olsa <jolsa@xxxxxxxxxx>
---
kernel/events/core.c | 12 ++++++++++++
1 file changed, 12 insertions(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 2d7363adf678..a1d220cf739b 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1117,6 +1117,12 @@ ctx_group_list(struct perf_event *event, struct perf_event_context *ctx)
return &ctx->flexible_groups;
}

+static bool has_inherit_read(struct perf_event *event)
+{
+ return event->attr.inherit &&
+ (event->attr.sample_type & PERF_SAMPLE_READ);
+}
+
/*
* Add a event from the lists for its context.
* Must be called with ctx->mutex and ctx->lock held.
@@ -1148,6 +1154,9 @@ list_add_event(struct perf_event *event, struct perf_event_context *ctx)
if (has_branch_stack(event))
ctx->nr_branch_stack++;

+ if (has_inherit_read(event))
+ ctx->pin_count++;
+
list_add_rcu(&event->event_entry, &ctx->event_list);
if (!ctx->nr_events)
perf_pmu_rotate_start(ctx->pmu);
@@ -1313,6 +1322,9 @@ list_del_event(struct perf_event *event, struct perf_event_context *ctx)
if (has_branch_stack(event))
ctx->nr_branch_stack--;

+ if (has_inherit_read(event))
+ ctx->pin_count--;
+
ctx->nr_events--;
if (event->attr.inherit_stat)
ctx->nr_stat--;
--
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/