Re: [PATCH -tip 0/5] kprobes: Fix stacktrace in kretprobes

From: Daniel Xu
Date: Sun Mar 07 2021 - 16:24:53 EST


On Sat, Mar 06, 2021 at 10:13:57AM +0900, Masami Hiramatsu wrote:
> On Fri, 5 Mar 2021 11:16:45 -0800
> Daniel Xu <dxu@xxxxxxxxx> wrote:
>
> > Hi Masami,
> >
> > On Sat, Mar 06, 2021 at 12:38:57AM +0900, Masami Hiramatsu wrote:
> > > Hello,
> > >
> > > Here is a series of patches for kprobes and stacktracer to fix the kretprobe
> > > entries in the kernel stack. This was reported by Daniel Xu. I thought that
> > > was in the bpftrace, but it is actually more generic issue.
> > > So I decided to fix the issue in arch independent part.
> > >
> > > While fixing the issue, I found a bug in ia64 related to kretprobe, which is
> > > fixed by [1/5]. [2/5] and [3/5] is a kind of cleanup before fixing the main
> > > issue. [4/5] is the patch to fix the stacktrace, which involves kretprobe
> > > internal change. And [5/5] removing the stacktrace kretprobe fixup code in
> > > ftrace.
> > >
> > > Daniel, can you also check that this fixes your issue too? I hope it is.
> >
> > Unfortunately, this patch series does not fix the issue I reported.
>
> Ah, OK. This was my mistake I didn't choose ORC unwinder for test kernel.
>
> >
> > I think the reason your tests work is because you're using ftrace and
> > the ORC unwinder is aware of ftrace trampolines (see
> > arch/x86/kernel/unwind_orc.c:orc_ftrace_find).
>
> OK, so it has to be fixed in ORC unwinder.
>
> >
> > bpftrace kprobes go through perf event subsystem (ie not ftrace) so
> > naturally orc_ftrace_find() does not find an associated trampoline. ORC
> > unwinding fails in this case because
> > arch/x86/kernel/kprobes/core.c:trampoline_handler sets
> >
> > regs->ip = (unsigned long)&kretprobe_trampoline;
> >
> > and `kretprobe_trampoline` is marked
> >
> > STACK_FRAME_NON_STANDARD(kretprobe_trampoline);
> >
> > so it doesn't have a valid ORC entry. Thus, ORC immediately bails when
> > trying to unwind past the first frame.
>
> Hm, in ftrace case, it decode kretprobe's stackframe and stoped right
> after kretprobe_trampoline callsite.
>
> => kretprobe_trace_func+0x21f/0x340
> => kretprobe_dispatcher+0x73/0xb0
> => __kretprobe_trampoline_handler+0xcd/0x1a0
> => trampoline_handler+0x3d/0x50
> => kretprobe_trampoline+0x25/0x50
> => 0
>
> kretprobe replaces the real return address with kretprobe_trampoline
> and kretprobe_trampoline *calls* trampoline_handler (this part depends
> on architecture implementation).
> Thus, if kretprobe_trampoline has no stack frame information, ORC may
> fails at the first kretprobe_trampoline+0x25, that is different from
> the kretprobe_trampoline+0, so the hack doesn't work.

I'm not sure I follow 100% what you're saying, but assuming you're
asking why bpftrace fails at `kretprobe_trampoline+0` instead of
`kretprobe_trampoline+0x25`:

`regs` is set to &kretprobe_trampoline:

regs->ip = (unsigned long)&kretprobe_trampoline;

Then the kretprobe event is dispatched like this:

kretprobe_trampoline_handler
__kretprobe_trampoline_handler
rp->handler // ie kernel/trace/trace_kprobe.c:kretprobe_dispatcher
kretprobe_perf_func
trace_bpf_call(.., regs)
BPF_PROG_RUN_ARRAY_CHECK
bpf_get_stackid(regs, .., ..) // in bpftrace prog

And then `bpf_get_stackid` unwinds the stack via:

bpf_get_stackid
get_perf_callchain(regs, ...)
perf_callchain_kernel(.., regs)
perf_callchain_store(.., regs->ip) // !!! first unwound entry
unwind_start
unwind_next_frame

In summary: unwinding via BPF begins at regs->ip, which
`trampoline_handler` sets to `&kretprobe_trampoline+0x0`.

> Hmm, how the other inline-asm function makes its stackframe info?
> If I get the kretprobe_trampoline+0, I can fix it up.

So I think I misunderstood the mechanics before. I think `call
trampoline_handler` does set up a new frame. My current guess is that
ftrace doesn't thread through `regs` like the bpf code path does. I'm
not very familiar with ftrace internals so I haven't looked.

>
> > The only way I can think of to fix this issue is to make the ORC
> > unwinder aware of kretprobe (ie the patch I sent earlier). I'm hoping
> > you have another idea if my patch isn't acceptable.
>
> OK, anyway, your patch doesn't care the case that the multiple functions
> are probed by kretprobes. In that case, you'll see several entries are
> replaced by the kretprobe_trampoline. To find it correctly, you have
> to pass a state-holder (@cur of the kretprobe_find_ret_addr())
> to the fixup entries.

I'll see if I can figure something out tomorrow.

Thanks,
Daniel