[PATCH 10/17] perf report: Prefer DWARF callstacks to LBR ones when captured both
From: Arnaldo Carvalho de Melo
Date: Tue Aug 20 2019 - 15:28:37 EST
From: Alexey Budankov <alexey.budankov@xxxxxxxxxxxxxxx>
Display DWARF based callchains when the perf.data file contains raw thread
stack data as LBR callstack data.
Commiter testing:
This changes the output from the branch stack based one, i.e. without
this patch, for the same file as in the previous csets:
# perf report --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
# Total Lost Samples: 0
#
# Samples: 13 of event 'cycles'
# Event count (approx.): 13
#
# Overhead Command Source Shared Object Source Symbol Target Symbol Basic Block Cycles
# ........ ....... .................... ........................... ......................................... ..................
#
7.69% ls libpthread-2.29.so [.] _init [.] __pthread_initialize_minimal_internal 6827
7.69% ls ld-2.29.so [k] _start [k] _dl_start -
7.69% ls ld-2.29.so [.] _dl_start_user [.] _dl_init -24790
7.69% ls ld-2.29.so [k] _dl_start [k] _dl_sysdep_start 278
7.69% ls ld-2.29.so [k] dl_main [k] _dl_map_object_deps 15581
7.69% ls ld-2.29.so [k] open_verify.constprop.0 [k] lseek64 4228
7.69% ls ld-2.29.so [k] _dl_map_object [k] open_verify.constprop.0 55
7.69% ls ld-2.29.so [k] openaux [k] _dl_map_object 67
7.69% ls ld-2.29.so [k] _dl_map_object_deps [k] 0x00007f441b57c090 112
7.69% ls ld-2.29.so [.] call_init.part.0 [.] _init 334
7.69% ls ld-2.29.so [.] _dl_init [.] call_init.part.0 383
7.69% ls ld-2.29.so [k] _dl_sysdep_start [k] dl_main 45
7.69% ls ld-2.29.so [k] _dl_catch_exception [k] openaux 116
#
# (Tip: For memory address profiling, try: perf mem record / perf mem report)
#
To the one that shows call chains:
# perf report --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 10 of event 'cycles'
# Event count (approx.): 3204047
#
# Children Self Command Shared Object Symbol
# ........ ........ ....... .................. .........................................
#
55.01% 0.00% ls [kernel.vmlinux] [k] entry_SYSCALL_64_after_hwframe
|
---entry_SYSCALL_64_after_hwframe
do_syscall_64
|
--16.01%--__x64_sys_execve
__do_execve_file.isra.0
search_binary_handler
load_elf_binary
elf_map
vm_mmap_pgoff
do_mmap
mmap_region
perf_event_mmap
perf_iterate_sb
perf_iterate_ctx
perf_event_mmap_output
perf_output_copy
memcpy_erms
55.01% 39.00% ls [kernel.vmlinux] [k] do_syscall_64
|
|--39.00%--0xffffffffffffffff
| _dl_map_object
| open_verify.constprop.0
| __lseek64 (inlined)
| entry_SYSCALL_64_after_hwframe
| do_syscall_64
|
--16.01%--do_syscall_64
__x64_sys_execve
__do_execve_file.isra.0
search_binary_handler
load_elf_binary
elf_map
vm_mmap_pgoff
do_mmap
mmap_region
perf_event_mmap
perf_iterate_sb
perf_iterate_ctx
perf_event_mmap_output
perf_output_copy
memcpy_erms
42.95% 42.95% ls libpthread-2.29.so [.] __pthread_initialize_minimal_internal
|
---_init
__pthread_initialize_minimal_internal
42.95% 0.00% ls libpthread-2.29.so [.] _init
|
---_init
__pthread_initialize_minimal_internal
<SNIP>
#
# (Tip: Profiling branch (mis)predictions with: perf record -b / perf report)
#
#
The branch stack view be explicitely selected using:
# perf report -h branch-stack
Usage: perf report [<options>]
-b, --branch-stack use branch records for per branch histogram filling
#
I.e. after this patch:
# perf report -b --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 13 of event 'cycles'
# Event count (approx.): 13
#
# Overhead Command Source Shared Object Source Symbol Target Symbol Basic Block Cycles
# ........ ....... .................... ........................... ......................................... ..................
#
7.69% ls libpthread-2.29.so [.] _init [.] __pthread_initialize_minimal_internal 6827
7.69% ls ld-2.29.so [k] _start [k] _dl_start -
7.69% ls ld-2.29.so [.] _dl_start_user [.] _dl_init -24790
7.69% ls ld-2.29.so [k] _dl_start [k] _dl_sysdep_start 278
7.69% ls ld-2.29.so [k] dl_main [k] _dl_map_object_deps 15581
7.69% ls ld-2.29.so [k] open_verify.constprop.0 [k] lseek64 4228
7.69% ls ld-2.29.so [k] _dl_map_object [k] open_verify.constprop.0 55
7.69% ls ld-2.29.so [k] openaux [k] _dl_map_object 67
7.69% ls ld-2.29.so [k] _dl_map_object_deps [k] 0x00007f441b57c090 112
7.69% ls ld-2.29.so [.] call_init.part.0 [.] _init 334
7.69% ls ld-2.29.so [.] _dl_init [.] call_init.part.0 383
7.69% ls ld-2.29.so [k] _dl_sysdep_start [k] dl_main 45
7.69% ls ld-2.29.so [k] _dl_catch_exception [k] openaux 116
#
# (Tip: Show current config key-value pairs: perf config --list)
#
#
Signed-off-by: Alexey Budankov <alexey.budankov@xxxxxxxxxxxxxxx>
Tested-by: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
Cc: Alexander Shishkin <alexander.shishkin@xxxxxxxxxxxxxxx>
Cc: Andi Kleen <ak@xxxxxxxxxxxxxxx>
Cc: Jin Yao <yao.jin@xxxxxxxxxxxxxxx>
Cc: Jiri Olsa <jolsa@xxxxxxxxxx>
Cc: Kan Liang <kan.liang@xxxxxxxxxxxxxxx>
Cc: Namhyung Kim <namhyung@xxxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Link: http://lkml.kernel.org/r/ccbd9583-82f4-dec5-7e84-64bf56e351fb@xxxxxxxxxxxxxxx
Signed-off-by: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
---
tools/perf/builtin-report.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 5e003d02821e..79dfb1139f94 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -1281,6 +1281,8 @@ int cmd_report(int argc, const char **argv)
has_br_stack = perf_header__has_feat(&session->header,
HEADER_BRANCH_STACK);
+ if (perf_evlist__combined_sample_type(session->evlist) & PERF_SAMPLE_STACK_USER)
+ has_br_stack = false;
setup_forced_leader(&report, session->evlist);
--
2.21.0