Re: [RFC bpf-next 3/4] bpf: Resolve symbols with kallsyms_lookup_names for kprobe multi link

From: Andrii Nakryiko
Date: Mon Apr 11 2022 - 18:16:03 EST


On Fri, Apr 8, 2022 at 4:26 PM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Thu, Apr 07, 2022 at 02:52:23PM +0200, Jiri Olsa wrote:
> > Using kallsyms_lookup_names function to speed up symbols lookup in
> > kprobe multi link attachment and replacing with it the current
> > kprobe_multi_resolve_syms function.
> >
> > This speeds up bpftrace kprobe attachment:
> >
> > # perf stat -r 5 -e cycles ./src/bpftrace -e 'kprobe:x* { } i:ms:1 { exit(); }'
> > ...
> > 6.5681 +- 0.0225 seconds time elapsed ( +- 0.34% )
> >
> > After:
> >
> > # perf stat -r 5 -e cycles ./src/bpftrace -e 'kprobe:x* { } i:ms:1 { exit(); }'
> > ...
> > 0.5661 +- 0.0275 seconds time elapsed ( +- 4.85% )
> >
> > Signed-off-by: Jiri Olsa <jolsa@xxxxxxxxxx>
> > ---
> > kernel/trace/bpf_trace.c | 123 +++++++++++++++++++++++----------------
> > 1 file changed, 73 insertions(+), 50 deletions(-)
> >
> > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > index b26f3da943de..2602957225ba 100644
> > --- a/kernel/trace/bpf_trace.c
> > +++ b/kernel/trace/bpf_trace.c
> > @@ -2226,6 +2226,72 @@ struct bpf_kprobe_multi_run_ctx {
> > unsigned long entry_ip;
> > };
> >
> > +struct user_syms {
> > + const char **syms;
> > + char *buf;
> > +};
> > +
> > +static int copy_user_syms(struct user_syms *us, void __user *usyms, u32 cnt)
> > +{
> > + const char __user **usyms_copy = NULL;
> > + const char **syms = NULL;
> > + char *buf = NULL, *p;
> > + int err = -EFAULT;
> > + unsigned int i;
> > + size_t size;
> > +
> > + size = cnt * sizeof(*usyms_copy);
> > +
> > + usyms_copy = kvmalloc(size, GFP_KERNEL);
> > + if (!usyms_copy)
> > + return -ENOMEM;
> > +
> > + if (copy_from_user(usyms_copy, usyms, size))
> > + goto error;
> > +
> > + err = -ENOMEM;
> > + syms = kvmalloc(size, GFP_KERNEL);
> > + if (!syms)
> > + goto error;
> > +
> > + /* TODO this potentially allocates lot of memory (~6MB in my tests
> > + * with attaching ~40k functions). I haven't seen this to fail yet,
> > + * but it could be changed to allocate memory gradually if needed.
> > + */
>
> Why would 6MB kvmalloc fail?
> If we don't have such memory the kernel will be ooming soon anyway.
> I don't think we'll see this kvmalloc triggering oom in practice.
> The verifier allocates a lot more memory to check large programs.
>
> imo this approach is fine. It's simple.
> Trying to do gradual alloc with realloc would be just guessing.
>
> Another option would be to ask user space (libbpf) to do the sort.
> There are pros and cons.
> This vmalloc+sort is slightly better imo.

Totally agree, the simpler the better. Also when you are attaching to
tens of thousands of probes, 6MB isn't a lot of memory for whatever
you are trying to do, anyways :)

Even if libbpf had to sort it, kernel would probably have to validate
that. Also for binary search you'd still need to read in the string
itself, but if you'd do this "on demand", we are adding TOCTOU and
other headaches.

Simple is good.

>
> > + size = cnt * KSYM_NAME_LEN;
> > + buf = kvmalloc(size, GFP_KERNEL);
> > + if (!buf)
> > + goto error;
> > +
> > + for (p = buf, i = 0; i < cnt; i++) {
> > + err = strncpy_from_user(p, usyms_copy[i], KSYM_NAME_LEN);
> > + if (err == KSYM_NAME_LEN)
> > + err = -E2BIG;
> > + if (err < 0)
> > + goto error;

[...]