Re: [RFC bpf-next 0/4] bpf: Speed up symbol resolving in kprobe multi link

From: Andrii Nakryiko
Date: Mon Apr 11 2022 - 18:15:26 EST


On Sat, Apr 9, 2022 at 1:24 PM Jiri Olsa <olsajiri@xxxxxxxxx> wrote:
>
> On Fri, Apr 08, 2022 at 04:29:22PM -0700, Alexei Starovoitov wrote:
> > On Thu, Apr 07, 2022 at 02:52:20PM +0200, Jiri Olsa wrote:
> > > hi,
> > > sending additional fix for symbol resolving in kprobe multi link
> > > requested by Alexei and Andrii [1].
> > >
> > > This speeds up bpftrace kprobe attachment, when using pure symbols
> > > (3344 symbols) to attach:
> > >
> > > Before:
> > >
> > > # perf stat -r 5 -e cycles ./src/bpftrace -e 'kprobe:x* { } i:ms:1 { exit(); }'
> > > ...
> > > 6.5681 +- 0.0225 seconds time elapsed ( +- 0.34% )
> > >
> > > After:
> > >
> > > # perf stat -r 5 -e cycles ./src/bpftrace -e 'kprobe:x* { } i:ms:1 { exit(); }'
> > > ...
> > > 0.5661 +- 0.0275 seconds time elapsed ( +- 4.85% )
> > >
> > >
> > > There are 2 reasons I'm sending this as RFC though..
> > >
> > > - I added test that meassures attachment speed on all possible functions
> > > from available_filter_functions, which is 48712 functions on my setup.
> > > The attach/detach speed for that is under 2 seconds and the test will
> > > fail if it's bigger than that.. which might fail on different setups
> > > or loaded machine.. I'm not sure what's the best solution yet, separate
> > > bench application perhaps?
> >
> > are you saying there is a bug in the code that you're still debugging?
> > or just worried about time?
>
> just the time, I can make the test fail (cross the 2 seconds limit)
> when the machine is loaded, like with running kernel build
>
> but I couldn't reproduce this with just paralel test_progs run
>
> >
> > I think it's better for it to be a part of selftest.
> > CI will take extra 2 seconds to run.
> > That's fine. It's a good stress test.

I agree it's a good stress test, but I disagree on adding it as a
selftests. The speed will depend on actual host machine. In VMs it
will be slower, on busier machines it will be slower, etc. Generally,
depending on some specific timing just causes unnecessary maintenance
headaches. We can have this as a benchmark, if someone things it's
very important. I'm impartial to having this regularly executed as
it's extremely unlikely that we'll accidentally regress from NlogN
back to N^2. And if there is some X% slowdown such selftest is
unlikely to alarm us anyways. Sporadic failures will annoy us way
before that to the point of blacklisting this selftests in CI at the
very least.


>
> ok, great
>
> thanks,
> jirka
>
> >
> > > - copy_user_syms function potentially allocates lot of memory (~6MB in my
> > > tests with attaching ~48k functions). I haven't seen this to fail yet,
> > > but it might need to be changed to allocate memory gradually if needed,
> > > do we care? ;-)
> >
> > replied in the other email.
> >
> > Thanks for working on this!