Re: [RFC bpf-next 4/4] selftests/bpf: Add attach bench test

From: Steven Rostedt
Date: Thu Apr 28 2022 - 19:53:14 EST


On Thu, 28 Apr 2022 16:32:20 -0700
Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote:
> >
> > The job of recordmcount is to create a section of all the locations that
> > call fentry. That is EXACTLY what it did. No bug there! It did its job.
>
> But that __fentry__ call is not part of __bpf_tramp_exit, actually.
> Whether to call it a bug or limitation is secondary. It marks
> __bpf_tramp_exit as attachable through kprobe/ftrace while it really
> isn't.

I'm confused by what you mean by "marks __bpf_tramp_exit as attachable"?
What does? Where does it get that information? Does it read
available_filter_functions?

recordmcount isn't responsible for any of that, you are thinking of
kallsyms. Specifically *printf("%ps"). Because that's where the name comes
from. Anytime you print an address with "%ps" on a weak function that has
been overridden, it will give you the symbol before it. I guess you can
call it a bug in the "%ps" logic.


>
> Below you are saying there is only user confusion. It's not just
> confusion. You'll get an error when you try to attach to
> __bpf_tramp_exit because __bpf_tramp_exit doesn't really have
> __fentry__ preamble and thus the kernel itself will reject it as a
> target. So when you build a generic tracing tool that fetches all the
> attachable kprobes, filters out all the blacklisted ones, you still
> end up with kprobe targets that are not attachable. It's definitely
> more than an inconvenience which I experienced first hand.
>
> Can recordmcount or whoever does this be taught to use proper FUNC
> symbol size to figure out boundaries of the function?

kallsyms needs to do it. All recordmcount does is to give the locations of
the calls to fentry. It only gives addresses and does not give any symbol
information. Stop blaming recordcmcount!

>
> $ readelf -s ~/linux-build/default/vmlinux | rg __bpf_tramp_exit
> 129408: ffffffff811b2ba0 63 FUNC GLOBAL DEFAULT 1 __bpf_tramp_exit
>
> So only the first 63 bytes of instruction after __bpf_tramp_exit
> should be taken into account. Everything else doesn't belong to
> __bpf_tramp_exit. So even though objdump pretends that call __fentry__
> is part of __bpf_tramp_exit, it's not.
>
> ffffffff811b2ba0 <__bpf_tramp_exit>:
> ffffffff811b2ba0: 53 push %rbx
> ffffffff811b2ba1: 48 89 fb mov %rdi,%rbx
> ffffffff811b2ba4: e8 97 d2 f2 ff call
> ffffffff810dfe40 <__rcu_read_lock>
> ffffffff811b2ba9: 48 8b 83 e0 00 00 00 mov 0xe0(%rbx),%rax
> ffffffff811b2bb0: a8 03 test $0x3,%al
> ffffffff811b2bb2: 75 0a jne
> ffffffff811b2bbe <__bpf_tramp_exit+0x1e>
> ffffffff811b2bb4: 65 48 ff 08 decq %gs:(%rax)
> ffffffff811b2bb8: 5b pop %rbx
> ffffffff811b2bb9: e9 d2 0e f3 ff jmp
> ffffffff810e3a90 <__rcu_read_unlock>
> ffffffff811b2bbe: 48 8b 83 e8 00 00 00 mov 0xe8(%rbx),%rax
> ffffffff811b2bc5: f0 48 83 28 01 lock subq $0x1,(%rax)
> ffffffff811b2bca: 75 ec jne
> ffffffff811b2bb8 <__bpf_tramp_exit+0x18>
> ffffffff811b2bcc: 48 8b 83 e8 00 00 00 mov 0xe8(%rbx),%rax
> ffffffff811b2bd3: 48 8d bb e0 00 00 00 lea 0xe0(%rbx),%rdi
> ffffffff811b2bda: ff 50 08 call *0x8(%rax)
> ffffffff811b2bdd: eb d9 jmp
> ffffffff811b2bb8 <__bpf_tramp_exit+0x18>
> ffffffff811b2bdf: 90 nop
>
> ^^^ ffffffff811b2ba0 + 63 = ffffffff811b2bdf -- this is the end of
> __bpf_tramp_exit
>
> ffffffff811b2be0: e8 3b 9c e9 ff call
> ffffffff8104c820 <__fentry__>
> ffffffff811b2be5: b8 f4 fd ff ff mov $0xfffffdf4,%eax
> ffffffff811b2bea: c3 ret
> ffffffff811b2beb: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
>
>


> > One solution is to simply get the end of the function that is provided by
> > kallsyms to make sure the fentry call location is inside the function, and
> > if it is not, then not show that function in available_filter_functions but
> > instead show something like "** unnamed function **" or whatever.

Do the above. The names in available_filter_functions are derived from
kallsyms. There's a way to also ask kallsyms to give you the end pointer
of the function address. The only thing that avaliable_filter_functions
does is to print the location found by recordmcount with a "%ps".

If you don't want it to show up in available_filter_functions, then you
need to open code the %ps onto kallsyms lookup and then compare the
function with the end (if it is found). Or fix %ps.

-- Steve