Re: [RFC PATCH 00/13] perf tools: Support uBPF script

From: Alexei Starovoitov
Date: Wed Apr 20 2016 - 18:06:23 EST


On Wed, Apr 20, 2016 at 06:01:40PM +0000, Wang Nan wrote:
> This patch set allows to perf invoke some user space BPF scripts on some
> point. uBPF scripts and kernel BPF scripts reside in one BPF object.
> They communicate with each other with BPF maps. uBPF scripts can invoke
> helper functions provided by perf.
>
> At least following new features can be achieved based on uBPF support:
>
> 1) Report statistical result:
> Like DTrace, perf print statistical report before quit. No need to
> extract data using 'perf report'. Statistical method is controled by
> user.
>
> 2) Control perf's behavior:
> Dynamically adjust period of different events. Policy is defined by
> user.
>
> uBPF library is required before compile. It can be found from github:
>
> https://github.com/iovisor/ubpf.git
>
> Following is an example:
>
> Using BPF script attached at the bottom of this commit message, one
> can print histogram of write size before perf exit like this:
>
> # ~/perf record -a -e ./test_ubpf.c &
> [1] 16800
> # dd if=/dev/zero of=/dev/null bs=512 count=5000
> 5000+0 records in
> 5000+0 records out
> 2560000 bytes (2.6 MB) copied, 0.00552838 s, 463 MB/s
> # dd if=/dev/zero of=/dev/null bs=2048 count=5000
> 5000+0 records in
> 5000+0 records out
> 10240000 bytes (10 MB) copied, 0.0188971 s, 542 MB/s
> # fg
> ^C <--- *Press Ctrl-c*
> 2^^0: 47
> 2^^1: 13
> 2^^2: 4
> 2^^3: 130
> 2^^4: 11
> 2^^5: 1051
> 2^^6: 486
> 2^^7: 4863
> 2^^8: 0
> 2^^9: 5003
> 2^^10: 4
> 2^^11: 5003
> 2^^12: 1
> 2^^13: 0
> 2^^14: 0
> 2^^15: 0
> 2^^16: 0
> 2^^17: 0
> 2^^18: 0
> 2^^19: 0
> 2^^20: 0
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.788 MB perf.data ]
>
> Here is test_ubpf.c.
>
> /************ BEGIN ***************/
> #include <uapi/linux/bpf.h>
> #define SEC(NAME) __attribute__((section(NAME), used))
> struct bpf_map_def {
> unsigned int type;
> unsigned int key_size;
> unsigned int value_size;
> unsigned int max_entries;
> };
>
> #define BPF_ANY 0
>
> static void *(*map_lookup_elem)(struct bpf_map_def *, void *) =
> (void *)BPF_FUNC_map_lookup_elem;
>
> static inline unsigned int log2(unsigned int v)
> {
> unsigned int r;
> unsigned int shift;
>
> r = (v > 0xFFFF) << 4; v >>= r;
> shift = (v > 0xFF) << 3; v >>= shift; r |= shift;
> shift = (v > 0xF) << 2; v >>= shift; r |= shift;
> shift = (v > 0x3) << 1; v >>= shift; r |= shift;
> r |= (v >> 1);
> return r;
> }
>
> static inline unsigned int log2l(unsigned long v)
> {
> unsigned int hi = v >> 32;
> if (hi)
> return log2(hi) + 32;
> else
> return log2(v);
> }
>
> struct bpf_map_def SEC("maps") my_hist_map = {
> .type = BPF_MAP_TYPE_ARRAY,
> .key_size = sizeof(int),
> .value_size = sizeof(long),
> .max_entries = 21,
> };
>
> SEC("sys_write=sys_write count")
> int sys_write(void *ctx, int err, long write_size)
> {
> long *value;
> int key = 0;
>
> if (err)
> return 0;
>
> key = log2l(write_size);
> if (key > 20)
> key = 20;
> value = map_lookup_elem(&my_hist_map, &key);
> if (!value)
> return 0;
> __sync_fetch_and_add(value, 1);
> return 0;
> }
> char _license[] SEC("license") = "GPL";
> u32 _version SEC("version") = LINUX_VERSION_CODE;
>
> /* Following ugly magic numbers can be find from tools/perf/util/ubpf-helpers-list.h */
> static int (*ubpf_memcmp)(void *s1, void *s2, unsigned int n) = (void *)1;
> static void (*ubpf_memcpy)(void *d, void *s, unsigned int size) = (void *)2;
> static int (*ubpf_strcmp)(void *s1, void *s2) = (void *)3;
> static int (*ubpf_printf)(char *fmt, ...) = (void *)4;
> static int (*ubpf_map_lookup_elem)(void *map_desc, void *key, void *value) = (void *)5;
> static int (*ubpf_map_update_elem)(void *map_desc, void *key, void *value, unsigned long long flags) = (void *)6;
> static int (*ubpf_map_get_next_key)(void *map_desc, void *key, void *value) = (void *)7;
>
> SEC("UBPF;perf_record_end")
> int perf_record_end(int samples)
> {
> int i, key;
> long value;
> char fmt[] = "2^^%d: %d\n";
>
> for (i = 0; i < 21; i++) {
> ubpf_map_lookup_elem(&my_hist_map, &i, &value);
> ubpf_printf(fmt, i, value);
> }
> return 0;
> }

Interesting!
If bpf is used for both kernel and user side programs, we can allow
almost arbitrary C code for the user side.
There is no need to be limited to a fixed set of helpers.
There is no verifier in user space either.
Just call 'printf("string")' directly.
Wouldn't even need to change interpreter.
Also ubpf was written from scratch with apache2, while perf is gpl,
so you can just link kernel/bpf/core.o directly instead of using external
libraries.
I really meant link .o file compiled for kernel.
Advertize dummy kfree/kmalloc and it will link fine, since perf
will only be calling __bpf_prog_run() which is 99% indepdendent from kernel.
I used to do exactly that long ago while performance tunning the interpreter.
Another option is to fork the interpreter for perf, but I don't like it at all.
Compiling the same bpf/core.c once for kernel and once for perf is another option,
but imo linking core.o is easier.

In general this set and overall bpf in user space makes sense only
if we allow much more flexible C code for user space.
If it's limited to ubpf_* helpers, that will quickly become suboptimal.
Another alternative is to use luajit for user space scripting like
we do in bcc. That gives full flexibility with good performance.
If we can do 'restricted C into bpf' for kernel and 'full C into bpf'
for user space that would be a great model. Note llvm doesn't care
how C looks like. You can call any function in C and use loops.