Re: [PATCH perf,bpf 0/5] reveal invisible bpf programs

From: Peter Zijlstra
Date: Mon Nov 26 2018 - 09:50:16 EST


On Thu, Nov 22, 2018 at 06:13:32PM +0000, Song Liu wrote:
> Hi Peter,
>
> > On Nov 22, 2018, at 1:32 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> > On Wed, Nov 21, 2018 at 11:54:57AM -0800, Song Liu wrote:
> >> Changes RFC -> PATCH v1:
> >>
> >> 1. In perf-record, poll vip events in a separate thread;
> >> 2. Add tag to bpf prog name;
> >> 3. Small refactorings.
> >>
> >> Original cover letter (with minor revisions):
> >>
> >> This is to follow up Alexei's early effort to show bpf programs

> >> In this version, PERF_RECORD_BPF_EVENT is introduced to send real time BPF
> >> load/unload events to user space. In user space, perf-record is modified
> >> to listen to these events (through a dedicated ring buffer) and generate
> >> detailed information about the program (struct bpf_prog_info_event). Then,
> >> perf-report translates these events into proper symbols.
> >>
> >> With this set, perf-report will show bpf program as:
> >>
> >> 18.49% 0.16% test [kernel.vmlinux] [k] ksys_write
> >> 18.01% 0.47% test [kernel.vmlinux] [k] vfs_write
> >> 17.02% 0.40% test bpf_prog [k] bpf_prog_07367f7ba80df72b_
> >> 16.97% 0.10% test [kernel.vmlinux] [k] __vfs_write
> >> 16.86% 0.12% test [kernel.vmlinux] [k] comm_write
> >> 16.67% 0.39% test [kernel.vmlinux] [k] bpf_probe_read
> >>
> >> Note that, the program name is still work in progress, it will be cleaner
> >> with function types in BTF.
> >>
> >> Please share your comments on this.
> >
> > So I see:
> >
> > kernel/bpf/core.c:void bpf_prog_kallsyms_add(struct bpf_prog *fp)
> >
> > which should already provide basic symbol information for extant eBPF
> > programs, right?
>
> Right, if the BPF program is still loaded when perf-report runs, symbols
> are available.

Good, that is not something that was clear. The Changelog seems to imply
we need this new stuff in order to observe symbols.

> > And (AFAIK) perf uses /proc/kcore for annotate on the current running
> > kernel (if not, it really should, given alternatives, jump_labels and
> > all other other self-modifying code).
> >
> > So this fancy new stuff is only for the case where your profile spans
> > eBPF load/unload events (which should be relatively rare in the normal
> > case, right), or when you want source annotated asm output (I normally
> > don't bother with that).
>
> This patch set adds two pieces of information:
> 1. At the beginning of perf-record, save info of existing BPF programs;
> 2. Gather information of BPF programs load/unload during perf-record.
>
> (1) is all in user space. It is necessary to show symbols of BPF program
> that are unloaded _after_ perf-record. (2) needs PERF_RECORD_BPF_EVENT
> from the ring buffer. It covers BPF program loaded during perf-record
> (perf record -- bpf_test).

I'm saying that if you given them symbols; most people won't need any of
that ever.

And just tracking kallsyms is _much_ cheaper than either 1 or 2. Alexei
was talking fairly big amounts of data per BPF prog. Dumping and saving
that sounds like pointless overhead for 99% of the users.

> > That is; I would really like this fancy stuff to be an optional extra
> > that is typically not needed.
> >
> > Does that make sense?
>
> (1) above is always enabled with this set. I added option no-bpf-events
> to disable (2). I guess you prefer the (2) is disabled by default, and
> enabled with an option?

I'm saying neither should be default enabled. Instead we should do
recording and tracking by default.

That gets people symbol information on BPF stuff, which is likely all
they ever need.

If they then decide they need more, then and only then can they enable
the fancy pants bpf-synchronous-syscall nonsense thingy. And like I said
before; I don't bother with source annotated asm output for
perf-annotate.

And if the bpf prog is still loaded, kcore should contain the raw jitted
asm just fine; you again don't need the bpf syscall stuff.


Now, I'm not saying this patch set is useless; but I'm saying most
people should not need this, and it is massive overkill for the needs of
most people.