Re: [PATCH v1 1/2] perf/core: Use sysctl to turn on/off dropping leaked kernel samples

From: Jin, Yao
Date: Mon Jun 18 2018 - 21:39:12 EST




On 6/18/2018 6:45 PM, Peter Zijlstra wrote:
On Mon, Jun 18, 2018 at 02:55:32PM +0800, Jin, Yao wrote:
Thanks for providing the patch. I understand this approach.

In my opinion, the skid window is from counter overflow to interrupt
delivered. While if the skid window is too *big* (e.g. user -> kernel), it
should be not very useful. So personally, I'd prefer to drop the samples.

I really don't get your insitence on dropping the sample. Dropping
samples is bad. Furthermore, doing what Mark suggests actually improves
the result by reducing the skid, if the event happened before we entered
(as it damn well should) then the user regs, which point at the entry
site, are a better approximation than our in-kernel set.

So not only do you not loose the sample, you actually get a better
sample.


OK, that's fine, thanks!

I guess Mark will post this patch, right?

Anyway looks we don't need following patch (0-stuffing sample->ip to indicate perf tool that it is a leak sample), right?

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 80cca2b..628b515 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6361,6 +6361,21 @@ perf_callchain(struct perf_event *event, struct pt_regs *regs)
return callchain ?: &__empty_callchain;
}

+static bool sample_is_leaked(struct perf_event *event, struct pt_regs *regs)
+{
+ /*
+ * Due to interrupt latency (AKA "skid"), we may enter the
+ * kernel before taking an overflow, even if the PMU is only
+ * counting user events.
+ * To avoid leaking information to userspace, we must always
+ * reject kernel samples when exclude_kernel is set.
+ */
+ if (event->attr.exclude_kernel && !user_mode(regs))
+ return true;
+
+ return false;
+}
+
void perf_prepare_sample(struct perf_event_header *header,
struct perf_sample_data *data,
struct perf_event *event,
@@ -6480,6 +6495,9 @@ void perf_prepare_sample(struct perf_event_header *header,

if (sample_type & PERF_SAMPLE_PHYS_ADDR)
data->phys_addr = perf_virt_to_phys(data->addr);
+
+ if (sample_is_leaked(event, regs))
+ data->ip = 0;
}

static void __always_inline
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index bfa60bc..1bfb697 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -404,6 +404,7 @@ struct events_stats {
u64 total_aux_lost;
u64 total_aux_partial;
u64 total_invalid_chains;
+ u64 total_dropped_samples;
u32 nr_events[PERF_RECORD_HEADER_MAX];
u32 nr_non_filtered_samples;
u32 nr_lost_warned;
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 8b93693..ec923f1 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1269,6 +1269,12 @@ static int machines__deliver_event(struct machines *machines,
++evlist->stats.nr_unprocessable_samples;
return 0;
}
+
+ if (sample->ip == 0) {
+ /* Drop the leaked kernel samples */
+ ++evlist->stats.total_dropped_samples;
+ return 0;
+ }
return perf_evlist__deliver_sample(evlist, tool, event, sample, evsel, machine);
case PERF_RECORD_MMAP:
return tool->mmap(tool, event, sample, machine);

Thanks
Jin Yao