Re: Getting empty callchain from perf_callchain_kernel()

From: Kairui Song
Date: Sun May 19 2019 - 14:09:48 EST


On Sat, May 18, 2019 at 5:48 AM Song Liu <songliubraving@xxxxxx> wrote:
>
>
>
> > On May 17, 2019, at 2:06 PM, Alexei Starovoitov <ast@xxxxxx> wrote:
> >
> > On 5/17/19 11:40 AM, Song Liu wrote:
> >> +Alexei, Daniel, and bpf
> >>
> >>> On May 17, 2019, at 2:10 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >>>
> >>> On Fri, May 17, 2019 at 04:15:39PM +0800, Kairui Song wrote:
> >>>> Hi, I think the actual problem is that bpf_get_stackid_tp (and maybe
> >>>> some other bfp functions) is now broken, or, strating an unwind
> >>>> directly inside a bpf program will end up strangely. It have following
> >>>> kernel message:
> >>>
> >>> Urgh, what is that bpf_get_stackid_tp() doing to get the regs? I can't
> >>> follow.
> >>
> >> I guess we need something like the following? (we should be able to
> >> optimize the PER_CPU stuff).
> >>
> >> Thanks,
> >> Song
> >>
> >>
> >> diff --git i/kernel/trace/bpf_trace.c w/kernel/trace/bpf_trace.c
> >> index f92d6ad5e080..c525149028a7 100644
> >> --- i/kernel/trace/bpf_trace.c
> >> +++ w/kernel/trace/bpf_trace.c
> >> @@ -696,11 +696,13 @@ static const struct bpf_func_proto bpf_perf_event_output_proto_tp = {
> >> .arg5_type = ARG_CONST_SIZE_OR_ZERO,
> >> };
> >>
> >> +static DEFINE_PER_CPU(struct pt_regs, bpf_stackid_tp_regs);
> >> BPF_CALL_3(bpf_get_stackid_tp, void *, tp_buff, struct bpf_map *, map,
> >> u64, flags)
> >> {
> >> - struct pt_regs *regs = *(struct pt_regs **)tp_buff;
> >> + struct pt_regs *regs = this_cpu_ptr(&bpf_stackid_tp_regs);
> >>
> >> + perf_fetch_caller_regs(regs);
> >
> > No. pt_regs is already passed in. It's the first argument.
> > If we call perf_fetch_caller_regs() again the stack trace will be wrong.
> > bpf prog should not see itself, interpreter or all the frames in between.
>
> Thanks Alexei! I get it now.
>
> In bpf_get_stackid_tp(), the pt_regs is get by dereferencing the first field
> of tp_buff:
>
> struct pt_regs *regs = *(struct pt_regs **)tp_buff;
>
> tp_buff points to something like
>
> struct sched_switch_args {
> unsigned long long pad;
> char prev_comm[16];
> int prev_pid;
> int prev_prio;
> long long prev_state;
> char next_comm[16];
> int next_pid;
> int next_prio;
> };
>
> where the first field "pad" is a pointer to pt_regs.
>
> @Kairui, I think you confirmed that current code will give empty call trace
> with ORC unwinder? If that's the case, can we add regs->ip back? (as in the
> first email of this thread.
>
> Thanks,
> Song
>

Hi thanks for the suggestion, yes we can add it should be good an idea
to always have IP when stack trace is not available.
But stack trace is actually still broken, it will always give only one
level of stacktrace (the IP).

--
Best Regards,
Kairui Song