Re: [PATCH 1/2] perf thread-stack: Fix thread stack return from kernel for kernel-only case

From: Arnaldo Carvalho de Melo
Date: Mon Jun 24 2019 - 14:40:05 EST


Em Wed, Jun 19, 2019 at 09:44:28AM +0300, Adrian Hunter escreveu:
> Commit f08046cb3082 ("perf thread-stack: Represent jmps to the start of a
> different symbol") had the side-effect of introducing more stack entries
> before return from kernel space. When user space is also traced, those
> entries are popped before entry to user space, but when user space is not
> traced, they get stuck at the bottom of the stack, making the stack grow
> progressively larger. Fix by detecting a return-from-kernel branch type,
> and popping kernel addresses from the stack then.
>
> Note, the problem and fix affect the exported Call Graph / Tree but not
> the callindent option used by "perf script --call-trace".
>
> Example:
>
> perf-with-kcore record example -e intel_pt//k -- ls
> perf-with-kcore script --itrace=bep -s ~/libexec/perf-core/scripts/python/export-to-sqlite.py example.db branches calls
> ~/libexec/perf-core/scripts/python/exported-sql-viewer.py example.db

Trying with you above instructions I hit some snags, 'perf-with-kcore'
expects the first arg to be the directory where it'll find the perf.data
file, etc, right? I.e. to work I had to do:

[root@quaco tmp]# rm -rf bep
[root@quaco tmp]# rm -rf example.db
[root@quaco tmp]# perf-with-kcore record bep -e intel_pt//k -- ls
Recording
Using /root/bin/perf
perf version 5.2.rc4.gd1d5628fa057
/root/bin/perf record -o bep/perf.data -e intel_pt//k -- ls
<SNIP>
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.123 MB bep/perf.data ]
Copying kcore
Done
[root@quaco tmp]# perf-with-kcore script bep --itrace=bep -s ~acme/libexec/perf-core/scripts/python/export-to-sqlite.py example.db branches calls
Using /root/bin/perf
perf version 5.2.rc4.gd1d5628fa057
/root/bin/perf script -i bep/perf.data --kallsyms=bep/kcore_dir/kallsyms --itrace=bep -s /home/acme/libexec/perf-core/scripts/python/export-to-sqlite.py example.db branches calls
2019-06-24 15:29:54.910209 Creating database ...
2019-06-24 15:29:54.913351 Writing records...
2019-06-24 15:29:58.009226 Adding indexes
2019-06-24 15:29:58.024644 Done
[root@quaco tmp]#

I.e. add 'bep' after 'script' and before '--itrace-bep'.

I'm fixing this up in the comment, please check.

With that in place I can reproduce your results below.


Thanks,

- Arnaldo

> Menu option: Reports -> Context-Sensitive Call Graph
>
> Before: (showing Call Path column only)
>
> Call Path
> â perf
> â ls
> â 12111:12111
> â setup_new_exec
> â __task_pid_nr_ns
> â perf_event_pid_type
> â perf_event_comm_output
> â perf_iterate_ctx
> â perf_iterate_sb
> â perf_event_comm
> â __set_task_comm
> â load_elf_binary
> â search_binary_handler
> â __do_execve_file.isra.41
> â __x64_sys_execve
> â do_syscall_64
> â entry_SYSCALL_64_after_hwframe
> â swapgs_restore_regs_and_return_to_usermode
> â native_iret
> â error_entry
> â do_page_fault
> â error_exit
> â retint_user
> â prepare_exit_to_usermode
> â native_iret
> â error_entry
> â do_page_fault
> â error_exit
> â retint_user
> â prepare_exit_to_usermode
> â native_iret
> â error_entry
> â do_page_fault
> â error_exit
> â retint_user
> â prepare_exit_to_usermode
> â native_iret

T


> After: (showing Call Path column only)
>
> Call Path
> â perf
> â ls
> â 12111:12111
> â setup_new_exec
> â __task_pid_nr_ns
> â perf_event_pid_type
> â perf_event_comm_output
> â perf_iterate_ctx
> â perf_iterate_sb
> â perf_event_comm
> â __set_task_comm
> â load_elf_binary
> â search_binary_handler
> â __do_execve_file.isra.41
> â __x64_sys_execve
> â do_syscall_64
> â entry_SYSCALL_64_after_hwframe
> â page_fault
> â entry_SYSCALL_64
> â do_syscall_64
> â __x64_sys_brk
> â __x64_sys_access
> â __x64_sys_openat
> â __x64_sys_newfstat
> â __x64_sys_mmap
> â __x64_sys_close
> â __x64_sys_read
> â __x64_sys_mprotect
> â __x64_sys_arch_prctl
> â __x64_sys_munmap
> â exit_to_usermode_loop
> â __x64_sys_set_tid_address
> â __x64_sys_set_robust_list
> â __x64_sys_rt_sigaction
> â __x64_sys_rt_sigprocmask
> â __x64_sys_prlimit64
> â __x64_sys_statfs
> â __x64_sys_ioctl
> â __x64_sys_getdents64
> â __x64_sys_write
> â __x64_sys_exit_group
>
> Signed-off-by: Adrian Hunter <adrian.hunter@xxxxxxxxx>
> Fixes: f08046cb3082 ("perf thread-stack: Represent jmps to the start of a different symbol")
> Cc: stable@xxxxxxxxxxxxxxx
> ---
> tools/perf/util/thread-stack.c | 30 +++++++++++++++++++++++++++++-
> 1 file changed, 29 insertions(+), 1 deletion(-)
>
> diff --git a/tools/perf/util/thread-stack.c b/tools/perf/util/thread-stack.c
> index 8e390f78486f..f91c00dfe23b 100644
> --- a/tools/perf/util/thread-stack.c
> +++ b/tools/perf/util/thread-stack.c
> @@ -637,6 +637,23 @@ static int thread_stack__bottom(struct thread_stack *ts,
> true, false);
> }
>
> +static int thread_stack__pop_ks(struct thread *thread, struct thread_stack *ts,
> + struct perf_sample *sample, u64 ref)
> +{
> + u64 tm = sample->time;
> + int err;
> +
> + /* Return to userspace, so pop all kernel addresses */
> + while (thread_stack__in_kernel(ts)) {
> + err = thread_stack__call_return(thread, ts, --ts->cnt,
> + tm, ref, true);
> + if (err)
> + return err;
> + }
> +
> + return 0;
> +}
> +
> static int thread_stack__no_call_return(struct thread *thread,
> struct thread_stack *ts,
> struct perf_sample *sample,
> @@ -919,7 +936,18 @@ int thread_stack__process(struct thread *thread, struct comm *comm,
> ts->rstate = X86_RETPOLINE_DETECTED;
>
> } else if (sample->flags & PERF_IP_FLAG_RETURN) {
> - if (!sample->ip || !sample->addr)
> + if (!sample->addr) {
> + u32 return_from_kernel = PERF_IP_FLAG_SYSCALLRET |
> + PERF_IP_FLAG_INTERRUPT;
> +
> + if (!(sample->flags & return_from_kernel))
> + return 0;
> +
> + /* Pop kernel stack */
> + return thread_stack__pop_ks(thread, ts, sample, ref);
> + }
> +
> + if (!sample->ip)
> return 0;
>
> /* x86 retpoline 'return' doesn't match the stack */
> --
> 2.17.1

--

- Arnaldo