Re: [PATCH 1/2] perf: Add persistent events

From: Peter Zijlstra
Date: Tue May 25 2010 - 11:00:14 EST


On Tue, 2010-05-25 at 09:32 +0200, Borislav Petkov wrote:
> From: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Date: Sun, May 23, 2010 at 09:23:21PM +0200
>
> > Either we add some notifier thing, or we simply add an explicit call in
> > the init sequence after the perf_event subsystem is running. I would
> > suggest we start with some explicit call, and take it from there.
>
> Ok, this couldn't be more straightforward. So I looked at the init
> sequence we do when booting wrt to perf/ftrace initialization:
>
> start_kernel
> ...
> |-> sched_init
> |-> perf_event_init
> ...
> |-> ftrace_init
> rest_init
> kernel_init
> |-> do_pre_smp_initcalls
> |...
> |-> smp_int
> |-> do_basic_setup
> |-> do_initcalls
>
> and one of the convenient places after both perf is initialized and
> ftrace has enumerated the tracepoints is do_initcalls() (It cannot be an
> early_initcall since at that time we're not running SMP yet and we want
> the MCE event per cpu.)
>
> So I added a core_initcall that registers the mce perf event. This makes
> it more or less a persistent event without any changes to the perf_event
> subsystem. I guess this should work - at least it builds here, will give
> it a run later.
>
> As a further enhancement, the init-function should read out all the
> logged mce events which survived the warm reboot and those which happen
> between mce init and the actual event registration so that perf can
> postprocess those too at a more convenient time.

Right, so that looks good. Now the interesting part is twofold:

1) expose these perf_events to userspace, since they're now created
in kernel, there is no user-space access point to them. One way
way would be to extend the perf syscall to allow attaching to an
existing instance (but that would limit us to a single instance per
'attr'), or create some /debug or /sys iteration of all such events.


2) get these things a buffer, perf_events as created don't actually
have an output buffer, normally that is created at mmap() time, but
since you cannot mmap() a kernel side event, it doesn't get to have
a buffer. This could be done by extracting perf_mmap_data_alloc()
into a sensible interface.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/