Re: [PATCH v2 net-next 1/3] perf, bpf: Add BPF support to all perf_event types

From: Alexei Starovoitov
Date: Tue May 30 2017 - 11:53:21 EST


On 5/29/17 2:39 AM, Peter Zijlstra wrote:


Do we want something like the below to replace much of the above?

if (!perf_event_valid_local(event, NULL, cpu))
goto err_out;

Seems to be roughly what you're after, although I suppose @cpu might be
hard to determine a priory, so maybe we should allow a magic value to
short-circuit that test.

---
kernel/events/core.c | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 8d6acaeeea17..a7dc34f19568 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3630,6 +3630,36 @@ static inline u64 perf_event_count(struct perf_event *event)
}

/*
+ * perf_event_valid_local() - validates if the event is usable by perf_event_read_local()
+ * event: the event to validate
+ * task: the task the @event will be used in
+ * cpu: the cpu the @event will be used on
+ *
+ * In case one wants to disallow all per-task events, use @task = NULL.
+ * In case one wants to disallow all per-cpu events, use @cpu = -1.
+ */
+bool perf_event_valid_local(struct perf_event *event, struct task_struct *task, int cpu)
+{
+ /* See perf_event_read_local() for the reasons for these tests */
+
+ if ((event->attach_state & PERF_ATTACH_TASK) &&
+ event->hw.target != task)
+ return false;
+
+ if (!(event->attach_state & PERF_ATTACH_TASK) &&
+ event->cpu != cpu)
+ return false;

we do if (unlikely(event->oncpu != cpu))
as dynamic check inside bpf_perf_event_read(), since we cannot do it
statically at perf_event_array update time.
If we drop the above 'if' and keep 'task==null' trick,
then indeed we can use this function as static check.

Right now we're trying to keep as many checks as possible as
static checks to make bpf_perf_event_read() faster.
I guess we can drop that approach and do perf_event_valid_local()
check for every read since perf_event_read_local() does all the
same checks anyway.
So how about converting all WARN_ON in perf_event_read_local()
into 'return -EINVAL' and change func proto into:
int perf_event_read_local(struct perf_event *even, u64 *counter_val)

> I cannot find reason for this comment. That is, why would
> perf_event_read_local() not support those two types?

I don't know. What is the meaning of
reading tracepoint/breakpoint counter?
Because of 'event->oncpu != cpu' dynamic check all counters are
expected to be per-cpu. I'm not sure how uncore counters work.
What do they have in event->oncpu? -1? I guess they have pmu->count?
So we cannot read them from bpf program anyway?

If we change warn_ons in perf_event_read_local() to returns
them we can make per-task counters working.
User side will open per-task counter and bpf program will
do current->pid != expected_pid check before calling
bpf_perf_event_read(). bpf scripts often do that already.

int perf_event_read_local(struct perf_event *even, u64 *counter_val)
{
local_irq_save(flags);
if ((event->attach_state & PERF_ATTACH_TASK) &&
event->hw.target != current)
return -EINVAL;

if (!(event->attach_state & PERF_ATTACH_TASK) &&
event->cpu != smp_processor_id()
return -EINVAL;
... inherit and pmu->count checks here ...
*counter_val = local64_read(&event->count)
local_irq_restore(flags);
return 0;
}

thoughts?