[RFC] perf tool improvement requests

From: Stephane Eranian
Date: Mon Sep 03 2018 - 22:46:04 EST


Hi Arnaldo, Jiri,

A few weeks ago, you had asked if I had more requests for the perf tool.
I have put together the following list to improve the usability of the
perf tool, at
least for our usage. Nothing is very big just small improvements here and there.

1/ perf stat interval printing

Today, the timestamp printed via perf stat -I is relative to the
start of the measurements. It would be beneficial to also support a
mode where it is using a source which can be synchronized with other
traces or profiles. For instance, using gettimeofday() or
clocktime(MONOTONIC).

2/ perf report event grouping

if you do:
$ perf record -e '{ cycles, instructions, branches }' ....
$ perf report
It will show the 3 profiles together which is VERY useful. However
the output is confusing because it is hard to tell which % corresponds
to which event. I know it is cmdline order. But it would be good to
have a header in the columns to point to the events, instead of
guessing. A few times, I had to revert to perf report --header-only to
figure out the event order. I discovered the 'i' key on the function
profile. But it is still hard to find the events, especially if you
passed many of them.

3/ annotate output of loops

Percentâ401f00: xor %eax,%eax
â401f02: test %edi,%edi
â401f04: â jle 401f2b <triad+0x2b>
â401f06: nopw %cs:0x0(%rax,%rax,1)
34.20 â401f1âââ movsd (%rcx,%rax,8),%xmm1
14.60 â401f1â: mulsd %xmm0,%xmm1
33.24 â401f1â: addsd (%rdx,%rax,8),%xmm1
9.98 â401f1â: movsd %xmm1,(%rsi,%rax,8)
0.10 â401f2â: add $0x1,%rax
0.03 â401f2âââ cmp %eax,%edi
7.84 â401f2ââââ jg 401f10 <triad+0x10>
â401f2b: mov $0x18,%eax
â401f30: â retq

The loop arrows cut through the code addresses. That is annoying!

4/ sorting and event groups

If I do:
$ perf record -e '{cycles,instructions}'
$ perf report
It will sort the samples based on the first (leader) of the
group. Yet here all events are sampling events. You could as well sort
with the second event. But I don't think perf report support sort
order on multiple events. Both are from the same category: syms (or
ip).

Right now, I would have to collect another profile:
$ perf record -e '{instructions,cycles}'
$ perf report

5) cgroups

Today, to measure multiple group events in the same cgroup, you need to do:
$ perf stat -e cycles,branch,instructions -G foo,foo,foo .....

You need to specify the cgroup N-times for N-events. It would be
good to support a mode where you'd have to specify the cgroup once:

$ perf stat -e cycles,branches,instructions --cgroup-all foo,bar

Would measure cycles,branches,instructions for both cgroup foo and bar.


6) perf script ip vs. callchain

I already submitted this request separately. It is about
providing a way to generate the callchain separately from the ip in
perf script. Right now, they are lumped together which is not always
useful. Also right now, the callchain is a multi-line output which is
not useful. perf script should stick with one line per sample, at
least when symbolization is off. We have examples of that with
brstack.

I may have more requests but I wanted to start with these for now.
Thanks for your efforts.