Re: [PATCH 2/2] perf record: Add --dry-run option to check cmdline options

From: Arnaldo Carvalho de Melo
Date: Mon Jun 20 2016 - 15:35:45 EST


Em Mon, Jun 20, 2016 at 12:16:55PM -0600, David Ahern escreveu:
> On 6/20/16 12:13 PM, Arnaldo Carvalho de Melo wrote:
> > 'perf cc' seems sensible, and has the added bonus of being one letter
> > shorter :-)

> perf is now a general front-end to a compiler?

Well, it is for quite a while already, what we're talking about here is
to have this:

# cat filter.c
#include <uapi/linux/bpf.h>
#define SEC(NAME) __attribute__((section(NAME), used))

SEC("func=hrtimer_nanosleep rqtp->tv_nsec")
int func(void *ctx, int err, long nsec)
{
return nsec > 1000;
}
char _license[] SEC("license") = "GPL";
int _version SEC("version") = LINUX_VERSION_CODE;
# perf trace -e nanosleep --event filter.c usleep 1
0.063 ( 0.063 ms): usleep/8041 nanosleep(rqtp: 0x7fff62bead80) = 0
# perf trace -e nanosleep --event filter.c usleep 2
0.008 ( 0.008 ms): usleep/8325 nanosleep(rqtp: 0x7ffc2afdf3b0) ...
0.008 ( ): perf_bpf_probe:func:(ffffffff811137d0) tv_nsec=2000)
0.070 ( 0.070 ms): usleep/8325 ... [continued]: nanosleep()) = 0
#

To not cal the clang compiler under the hood all the time, i.e.
pre-building the .o file that will then be used when present.

What Wang did was to make that possible by adding this to ~/.perfconfig:

# cat ~/.perfconfig
[llvm]
dump-obj = true
#

This way, when we run we get:

# trace -e nanosleep --event filter.c usleep 6
LLVM: dumpping filter.o
0.008 ( 0.008 ms): usleep/9189 nanosleep(rqtp: 0x7fff97a704d0 ) ...
0.008 ( ): perf_bpf_probe:func:(ffffffff811137d0) tv_nsec=6000)
0.070 ( 0.070 ms): usleep/9189 ... [continued]: nanosleep()) = 0
#
# file filter.o
filter.o: ELF 64-bit LSB relocatable, no machine, version 1 (SYSV), not stripped
# readelf -SW filter.o
There are 7 section headers, starting at offset 0x148:

Section Headers:
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 0] NULL 0000000000000000 000000 000000 00 0 0 0
[ 1] .strtab STRTAB 0000000000000000 0000e8 00005a 00 0 0 1
[ 2] .text PROGBITS 0000000000000000 000040 000000 00 AX 0 0 4
[ 3] func=hrtimer_nanosleep rqtp->tv_nsec PROGBITS 0000000000000000 000040 000028 00 AX 0 0 8
[ 4] license PROGBITS 0000000000000000 000068 000004 00 WA 0 0 1
[ 5] version PROGBITS 0000000000000000 00006c 000004 00 WA 0 0 4
[ 6] .symtab SYMTAB 0000000000000000 000070 000078 18 1 2 8
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
#

Generating this .o file explicitely and then, when found and somehow checked
that it matches what is in filter.c, shortcircuit the process bypassing the
clang call and using filter.o directly.

This will remove the need for having clang in embedded systems, for instance,
and will speed up using eBPF scripts with perf.

- Arnaldo