Re: [GIT PULL 00/11] perf/core improvements and fixes

From: Ingo Molnar
Date: Thu Mar 03 2016 - 03:21:40 EST



* Arnaldo Carvalho de Melo <acme@xxxxxxxxxx> wrote:

> Hi Ingo,
>
> Please consider pulling,
>
> - Arnaldo
>
> The following changes since commit 675965b00d734c985e4285f5bec7e524d15fc4e1:
>
> perf: Export perf_event_sysfs_show() (2016-02-29 09:35:27 +0100)
>
> are available in the git repository at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-20160229
>
> for you to fetch changes up to 575a02e00b11eecbbabcb1eb22eab4c68e91ae77:
>
> perf record: Ensure return non-zero rc when mmap fail (2016-02-29 12:44:15 -0300)
>
> ----------------------------------------------------------------
> perf/core improvements and fixes:
>
> User visible:
>
> - Check existence of frontend/backed stalled cycles in 'perf stat' (Andi Kleen)
>
> - Avoid installing .o files from tools/lib/ into the python extension (Jiri Olsa)
>
> - Rename the tracepoint '/format' field that carries the syscall ID from 'nr',
> that is also the name of some syscalls arguments, to "__syscall_nr", to
> avoid having multiple fields with the same name, that was breaking the
> python script skeleton generator from perf.data files (Taeung Song)
>
> - Support converting data from bpf events in 'perf data' (Wang Nan)
>
> Infrastructure:
>
> - Split libtraceevent's pevent_print_event() (Steven Rostedt)
>
> - Librarize some 'perf record' bits to allow handling multiple perf.data
> files per session (Wang Nan)
>
> - Ensure return non-zero rc when mmap fail in 'perf record' (Wang Nan)
>
> Signed-off-by: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
>
> ----------------------------------------------------------------
> Andi Kleen (1):
> perf stat: Check existence of frontend/backed stalled cycles

>
> Jiri Olsa (1):
> perf tools: Fix python extension build
>
> Steven Rostedt (1):
> tools lib traceevent: Split pevent_print_event() into specific functionality functions
>
> Taeung Song (2):
> perf trace: Check and discard not only 'nr' but also '__syscall_nr'
> tracing/syscalls: Rename "/format" tracepoint field name "nr" to "__syscall_nr:
>
> Wang Nan (6):
> perf data: Support converting data from bpf_perf_event_output()
> perf data: Explicitly set byte order for integer types
> perf record: Use WARN_ONCE to replace 'if' condition
> perf record: Extract synthesize code to record__synthesize()
> perf record: Introduce record__finish_output() to finish a perf.data
> perf record: Ensure return non-zero rc when mmap fail
>
> kernel/trace/trace_syscalls.c | 16 ++--
> tools/lib/traceevent/event-parse.c | 136 +++++++++++++++++++++++-------
> tools/lib/traceevent/event-parse.h | 13 +++
> tools/perf/builtin-record.c | 168 ++++++++++++++++++++++---------------
> tools/perf/builtin-stat.c | 22 ++++-
> tools/perf/builtin-trace.c | 8 +-
> tools/perf/util/data-convert-bt.c | 118 +++++++++++++++++++++++++-
> tools/perf/util/setup.py | 4 +
> 8 files changed, 372 insertions(+), 113 deletions(-)

Hm, there's a 'perf stat' regression that I can see:

Before:

triton:~/tip> perf stat -a sleep 1

Performance counter stats for 'system wide':

11990.023100 task-clock (msec) # 11.981 CPUs utilized
8,802 context-switches # 0.734 K/sec
543 cpu-migrations # 0.045 K/sec
97,375 page-faults # 0.008 M/sec
9,854,385,894 cycles # 0.822 GHz
15,274,841,152 stalled-cycles-frontend # 155.01% frontend cycles idle
<not supported> stalled-cycles-backend
9,634,486,137 instructions # 0.98 insn per cycle
# 1.59 stalled cycles per insn
1,818,488,088 branches # 151.667 M/sec
46,365,120 branch-misses # 2.55% of all branches

1.000741599 seconds time elapsed

After:

triton:~/tip> perf stat -a sleep 1

Performance counter stats for 'system wide':

11989.280397 task-clock (msec) # 11.981 CPUs utilized
1299 context-switches # 0.108 K/sec
6 cpu-migrations # 0.001 K/sec
70 page-faults # 0.006 K/sec
127008602 cycles # 0.011 GHz
279538533 stalled-cycles-frontend # 220.09% frontend cycles idle
119213269 instructions # 0.94 insn per cycle
# 2.34 stalled cycles per insn
24166678 branches # 2.016 M/sec
505681 branch-misses # 2.09% of all branches

1.000684278 seconds time elapsed


... see how the numbers became human-unreadable, losing the big-number separator?

I suspect it's due to the following commit:

fa184776ac27 perf stat: Check existence of frontend/backed stalled cycles

Thanks,

Ingo