[RFC resend] Perf: Trigger and dump sample info to perf.data from user space ring buffer

From: Yunlong Song
Date: Mon Sep 21 2015 - 23:17:12 EST


[Problem Background]

We want to run perf in daemon mode and collect the traces when the exception
(e.g., machine crashes, app performance goes down) appears. Perf may run for a
long time (from days to weeks or even months), since we do not know when the
exception will appear at all, however it will appear at some time (especially
for a beta product). If we simply use âperf recordâ as usual, here come two
problems as time goes by: 1 there will be large amounts of IOs created for writing
perf.data which may affects the performance a lot; 2 the size of perf.data will
be larger and larger as well. Although we can use eBPF to reduce the traces in
normal case, but in our case, the perf runs in daemon mode for a long time and
that will accumulate the traces as time goes by.


[One Solution]

In fact, we only need to collect the sample info which are created for a while
just before the exception appears. We do not care about the other sample info in
other time. So perhaps we have to change the current way how perf makes its
perf.data as follows:
1 Let perf allocate a user space ring buffer in a reasonable size, which is big
enough to store all the tracing info we care about (for a while) before the
exception appears;
2 Dump the sample info to the user space ring buffer, the size of user space
ring buffer is a constant value, so the newer sample info will replace the older
sample info;
3 After some kind of trigger (maybe via eBPF event, signal or socket
communication) which is caused by the exception situation, the user space ring
buffer should dump all its tracing info to perf.data.sample.TIME#


[Use Style]

We can add an option (such as â-M sizeâ or â--memory sizeâ) to define the
size of the user space ring buffer and active the user space ring buffer mode
described above. For convenience, we can add â--daemonâ to make perf run as a
daemon.
# perf record -M size -e ebpf.o -e cycles -g -F 100 -a sleep 1000000 &
Or
# perf record -M size -e ebpf.o -e cycles -g -F 100 -a --daemon

When the exception appears, it sends a signal (may also use eBPF event or socket
communication) to perf
# kill -SIGUSR1 1234
# ls
perf.data.auxiliary perf.data.sample.TIME1

When the 2nd exception appears
# kill -SIGUSR1 1234
# ls
perf.data.auxiliary perf.data.sample.TIME1 perf.data.sample.TIME2

......

When the Nth exception appears
# kill -SIGUSR1 1234
# ls
perf.data.auxiliary perf.data.sample.TIME1 perf.data.sample.TIME2 â perf.data.sample.TIMEN

We can user perf report or perf script to analyze each perf.data.sample.TIME#

Or finally, we can kill perf and combine perf.data.auxiliary with all the
perf.data.sample.TIME# to create all-in-one perf.data
# kill --SIGUSR2 1234
# ls
perf.data


[To Do]

If the idea mentioned above is OK, we want to implement it in the following steps:
1 Develop perfâs user space ring buffer, which can make newer sample info replace
older sample info.
2 Classify the tracing info into two kinds, one kind is just sample event, and we
only need some of them which are created (for a while) just before the exception
appears, we can call the first kind of tracing info as Optional tracing info,
and perf should dump this info to the user space ring buffer instead of perf.data;
the second kind is the tracing info which are required to analyze the sample events,
such as mmap_event to show the dso's related info, we can call this second kind of
tracing info as Auxiliary tracing info, and perf should dump this info into
perf.data.auxiliary or just directly into perf.data as before.
3 Develop a trigger for perf, which can activate perf to dump its user space ring
buffer to perf.data.sample.TIME#, or just append them into perf.data. The trigger
may include three interfaces, eBPF event, signal and socket communication.
4 Make perf report or perf script etc, have the ability to analyze the
perf.data.auxiliary, perf.data.sample.TIME#, or the final synthetic perf.data
combined from perf.data.auxiliary and all the perf.data.sample.TIME#
5 For daemon mode, we should also let perf support its running in backend all
the time and its ending from a trigger.


[Conclusion]

In fact, we realize a mechanism to make perf's tracing more refined and more
efficient. We regard the size of perf.data and the cost of writing perf.data as
an expensive resource, which has batter to be used in a more careful and
just-for-the-exception target way. This mechanism can be used both in daemon mode
or in non-daemon mode. This idea can be another way to filter the tracing events
compared to eBPF from different view.


--
Thanks,
Yunlong Song

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/