Re: [PATCH v6 3/4] bpf: Implement function bpf_perf_event_read() that get the selected hardware PMU conuter

From: xiakaixu
Date: Wed Aug 05 2015 - 22:49:57 EST


ä 2015/8/5 21:53, Peter Zijlstra åé:
> On Wed, Aug 05, 2015 at 12:04:25PM +0200, Peter Zijlstra wrote:
>> Also, you probably want a WARN_ON(in_nmi()) there, this function is
>> _NOT_ NMI safe.
>
> I had a wee think about that, and I think the below is safe.
>
> (with the obvious problem that WARN from NMI context is not safe)
>
> It does not give you up-to-date overcommit times but your version didn't
> either so I'm assuming you don't need those, if you do need those it
> needs more but we can do that too.
>
> ---
> include/linux/perf_event.h | 1 +
> kernel/events/core.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 54 insertions(+)
>
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 2027809433b3..64e821dd64f0 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -659,6 +659,7 @@ perf_event_create_kernel_counter(struct perf_event_attr *attr,
> void *context);
> extern void perf_pmu_migrate_context(struct pmu *pmu,
> int src_cpu, int dst_cpu);
> +extern u64 perf_event_read_local(struct perf_event *event);
> extern u64 perf_event_read_value(struct perf_event *event,
> u64 *enabled, u64 *running);
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 39753bfd9520..7105d37763c1 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -3222,6 +3222,59 @@ static inline u64 perf_event_count(struct perf_event *event)
> return __perf_event_count(event);
> }
>
> +/*
> + * NMI-safe method to read a local event, that is an event that
> + * is:
> + * - either for the current task, or for this CPU
> + * - does not have inherit set, for inherited task events
> + * will not be local and we cannot read them atomically
> + * - must not have a pmu::count method
> + */
> +u64 perf_event_read_local(struct perf_event *event)
> +{
> + unsigned long flags;
> + u64 val;
> +
> + /*
> + * Disabling interrupts avoids all counter scheduling (context
> + * switches, timer based rotation and IPIs).
> + */
> + local_irq_safe(flags);

s/local_irq_safe/local_irq_save, and I have compiled and tested this function
and it is fine. Will use it in the next set.

Thanks.
> +
> + /* If this is a per-task event, it must be for current */
> + WARN_ON_ONCE((event->attach_state & PERF_ATTACH_TASK) &&
> + event->hw.target != current);
> +
> + /* If this is a per-CPU event, it must be for this CPU */
> + WARN_ON_ONCE(!(event->attach_state & PERF_ATTACH_TASK) &&
> + event->cpu != smp_processor_id());
> +
> + /*
> + * It must not be an event with inherit set, we cannot read
> + * all child counters from atomic context.
> + */
> + WARN_ON_ONCE(event->attr.inherit);
> +
> + /*
> + * It must not have a pmu::count method, those are not
> + * NMI safe.
> + */
> + WARN_ON_ONCE(event->pmu->count);
> +
> + /*
> + * If the event is currently on this CPU, its either a per-task event,
> + * or local to this CPU. Furthermore it means its ACTIVE (otherwise
> + * oncpu == -1).
> + */
> + if (event->oncpu == smp_processor_id())
> + event->pmu->read(event);
> +
> + val = local64_read(&event->count);
> + local_irq_restore(flags);
> +
> + return val;
> +}
> +
> static u64 perf_event_read(struct perf_event *event)
> {
> /*
>
> .
>


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/