Re: [RFC] Perf: Trigger and dump sample info to perf.data from user space ring buffer

From: Borislav Petkov
Date: Mon Sep 21 2015 - 11:12:40 EST


On Mon, Sep 21, 2015 at 09:54:39PM +0800, Yunlong Song wrote:
> [Problem Background]
>
> We want to run perf in daemon mode and collect the traces when the exception
> (e.g., machine crashes, app performance goes down) appears. Perf may run for a
> long time (from days to weeks or even months), since we do not know when the
> exception will appear at all, however it will appear at some time (especially
> for a beta product). If we simply use âperf recordâ as usual, here come two

We do have patches to add perf persistent events which can run for
longer than the profiling session. We wanted to use those for RAS:

http://lwn.net/Articles/593655/

We just need to get them upstream. I guess due to lack of time and other, more
important issues, we get preempted each time ... :-\

(Leaving in the rest for reference).

> problems as time goes by: 1 there will be amounts of IOs created for writing
> perf.data which may affects the performance a lot; 2 the size of perf.data will
> be larger and larger as well. Although we can use eBPF to reduce the traces in
> normal case, but in our case, the perf runs in daemon mode for a long time and
> that will accumulate the traces as time goes by.
>
>
> [One Solution]
>
> In fact, we only need to collect the sample info which is created for a while
> just before the exception appears. We do not care about the other sample info in
> other time. So perhaps we have to change the current way how perf makes its
> perf.data as follows:
> 1 Let perf allocate a user space ring buffer in a reasonable size, which is big
> enough to store all the tracing info we care about (for a while) before the
> exception appears;
> 2 Dump the sample info to the user space ring buffer, the size of user space
> ring buffer is a constant value, so the newer sample info will replace the older
> sample info;
> 3 After some kind of trigger (maybe via eBPF event, signal or socket
> communication) which is caused by the exception situation, the user space ring
> buffer should dump all its tracing info to perf.data.sample.TIME#
>
>
> [Use Style]
>
> We can add an option (such as â-M sizeâ or â--memory sizeâ) to define the
> size of the user space ring buffer and active the user space ring buffer mode
> described above. For convenience, we can add --daemon to make perf run as a
> daemon.
> # perf record -M size -e bpf.o -e cycles -g -F 100 -a sleep 1000000
> Or
> # perf record -M size -e bpf.o -e cycles -g -F 100 -a --daemon
>
> When the exception appears, it sends a signal (may also use eBPF event or socket
> communication) to perf
> # kill -SIGUSR1 1234
> # ls
> perf.data.auxiliary perf.data.sample.TIME1
>
> When the 2nd exception appears
> # kill -SIGUSR1 1234
> # ls
> perf.data.auxiliary perf.data.sample.TIME1 perf.data.sample.TIME2
>
> ......
>
> When the nth exception appears
> # kill -SIGUSR1 1234
> # ls
> perf.data.auxiliary perf.data.sample.TIME1 perf.data.sample.TIME2 â perf.data.sample.TIMEn
>
> We can user perf report or perf script to analyze each perf.data.sample.TIME#
>
> Or finally, we can kill perf and combine perf.data.auxiliary with all the
> perf.data.sample.TIME# to create all-in-one perf.data
> # kill --SIGUSR2 1234
> # ls
> perf.data
>
>
> [To Do]
>
> If the idea mentioned above is OK, we want to realize it in the following steps:
> 1 Develop perfâs user space ring buffer, which can make newer sample info
> replace older sample info.
> 2 Classify the tracing info into two kinds, one kind is just sample event, we
> only need some of them which are created (for a while) just before the exception
> appears, we can call the first kind of tracing info as Optional tracing info,
> and perf should dump this info to the user space ring buffer; the second kind is
> the tracing info which are required to analyze the sample events, such as
> mmap_event to show the dsoâs related info, we can call this second kind of
> tracing info as Auxiliary tracing info, and perf should dump this info into
> perf.data.auxiliary or just directly into perf.data as before.
> 3 Develop a trigger for perf, which can activate perf to dump its user space
> ring buffer to perf.data.sample.TIME#, or just appends them into perf.data. The
> trigger may have three interfaces, eBPF event, signal and socket communication.
> 4 Make perf report or perf script etc, have the ability to analyze the
> perf.data.auxiliary, perf.data.sample.TIME#, or the final synthetic perf.data
> combined from perf.data.auxiliary and all the perf.data.sample.TIME#
> 5 For daemon mode, we should also let perf support its running in backend all
> the time and its ending from a trigger.
>
>
> [Conclusion]
>
> In fact, we realize a mechanism to make perfâs tracing more refined and
> efficient. We regard the size of perf.data and the cost of writing perf.data as
> an expensive resource, which should be used in a more careful and
> just-for-the-exception target way. This mechanism can be used both in daemon way
> or in non-daemon way. This idea can be another way to filter the tracing events
> compared to eBPF.
>
> Thanks,
> ------
> Yunlong Song
>

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/