Re: [RFC PATCH 0/3] Perf persistent events

From: Ingo Molnar
Date: Mon Mar 18 2013 - 04:40:20 EST

* Borislav Petkov <bp@xxxxxxxxx> wrote:

> From: Borislav Petkov <bp@xxxxxxx>
> Yeah,
> here's a refresh of the persistent events deal, accessing those is much
> cleaner now. Here's how:
> So kernel code initializes and enables the event at its convenience
> (during boot, whenever) and userspace goes and says:
> sys_perf_event_open(pattr,...)
> with pattr.persistent = 1. Userspace gets the persistent buffer file
> descriptor to read from. Without that, we get a normal perf file
> descriptor for the duration of the tracing.
> This saves all the diddling of trying to hand down file descriptors
> through debugfs or whatever. Instead, current perf code simply can use
> it.
> This is still RFC but things are starting to fall into place slowly. As
> always, any and all comments/suggestions are welcome.

That definitely looks interesting and desirable. It would be nice to have
more generic/flexible semantics by using the VFS for tracing context

That would allow 'stateful tracing', and not just in a kernel initiated
fashion: we could basically do ftrace-alike tracing, into persistent,
VFS-named buffers.

The question is, how are the individual buffers identified when and after
they have been created? An option would be to use cgroups for that -
cgroups already has its own VFS and syscall interfaces. But maybe some
other, explicit interface is needed (eventfs).

All the usecases we talked about in the past would work fine that way:

- the MCE events would show up as an already created set of buffers,
discoverable via the VFS interface.

- user-space could generate more 'tracing/profiling contexts' runtime.

- a boot tracer would activate via a boot option, and it would create a
tracing context - visible via the VFS interface.

- modern RAS daemon replacing mcelog

If you make that work, via a new perf tool side as well that allows the
creation of a tracing context (and a separate extraction as well), via
modified 'perf trace' or a new subcommand, that would be an major,
upstream-worthy perf feature IMO which would go way beyond the RAS usecase

Such a feature would become a popular instrumentation tool pretty quickly.


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at