Re: `perf report` about 1000x(!) slower in linux 4.15

From: Jan-Oliver Kaiser
Date: Tue Mar 20 2018 - 12:54:15 EST


The behavior persists with the most recent head of linux/master (1b5f3ba415fe4cf8b8b39c8d104ed44cde330658).

$ ./perf --version
perf version 4.16.rc6.g1b5f3ba4

$ uname -r
4.15.9-towo.1-siduction-amd64

(This is a debian unstable variant.)

$ ./perf report --header-only -i <my_perf.data>
# ========
# captured on: Fri Mar 16 18:14:05 2018
# hostname : blackbox
# os release : 4.15.9-towo.1-siduction-amd64
# perf version : 4.15.4
# arch : x86_64
# nrcpus online : 4
# nrcpus avail : 4
# cpudesc : Intel(R) Core(TM) i7-5500U CPU @ 2.40GHz
# cpuid : GenuineIntel,6,61,4
# total memory : 16343572 kB
# cmdline : /usr/bin/perf_4.15 record -F 99 --call-graph dwarf -- coqc -q -I /home/janno/.opam/iris-mtac2/lib/coq//user-contrib/Unicoq -I src -Q tests Mtac2Tests -R theories Mtac2 timings/decapp_vs_mmatch.v
# event : name = cycles:uppp, , size = 112, { sample_period, sample_freq } = 99, sample_type = IP|TID|TIME|ADDR|CALLCHAIN|PERIOD|REGS_USER|STACK_USER|DATA_SRC, disabled = 1, inherit = 1, exclude_kernel = 1, mma$
# CPU_TOPOLOGY info available, use -I to display
# NUMA_TOPOLOGY info available, use -I to display
# pmu mappings: intel_pt = 6, uncore_arb = 11, cstate_pkg = 14, breakpoint = 5, uncore_cbox_1 = 10, power = 12, cpu = 4, software = 1, uncore_imc = 8, uncore_cbox_0 = 9, cstate_core = 13, msr = 7
# CACHE info available, use -I to display
# missing features: TRACING_DATA BRANCH_STACK GROUP_DESC AUXTRACE STAT SAMPLE_TIME
# ========
#

Best,
Janno

On 03/20/2018 02:38 PM, Arnaldo Carvalho de Melo wrote:
Em Tue, Mar 20, 2018 at 12:57:29PM +0100, Jan-Oliver Kaiser escreveu:
After upgrading my system to linux 4.15 (from 4.14), `perf report` became
unusably slow. I estimate a decrease in performance by a factor of
100x-1000x. Some 21M perf.data files take about 30 seconds in the
"Processing events" step. `git bisect` points to
commit d8a88dd243a170a226aba33e7c53704db2f82aa6 (HEAD, refs/bisect/bad)
Author: Milian Wolff <milian.wolff@xxxxxxxx>
perf util: Enable handling of inlined frames by default
The slowdown can be worked around with `--no-inline`. If the slowdown is
expected, I would suggest reverting the default setting here or maybe
printing a warning if a lot of time is spent on this feature.
Do you need any additional information about my system or the recorded data
I am looking at?

Can you try with the latest perf tool?

[acme@jouet perf]$ make perf-tarxz-src-pkg ; ls -la perf-4*
TAR
PERF_VERSION = 4.16.rc6.gecd380
-rw-rw-r--. 1 acme acme 1323568 Mar 20 10:30 perf-4.16.0-rc6.tar.xz
[acme@jouet perf]$

With a recently checked out kernel sources, or, as a convenience, I'm
pushing this to:

http://vger.kernel.org/~acme/perf/perf-4.16.0-rc6.tar.xz

You just expand it and then:

[acme@jouet tmp]$ tar xf perf-4.16.0-rc6.tar.xz
[acme@jouet tmp]$ cd perf-4.16.0-rc6/
[acme@jouet perf-4.16.0-rc6]$ make -C tools/perf install-bin


And check if the problem is present there as well.

If it is, please tell us what is your distro, the output of:

perf report --header-only

Thanks,

- Arnaldo