RE: [PATCHv3 00/56] perf stat: Add scripting support

From: Liang, Kan
Date: Thu Oct 08 2015 - 14:37:53 EST



Hi Jirka

I'm doing some test with the perf/stat_script tree.
Here are the issues I found so far.

1 test case failed to build:

CC tests/stat.o
tests/stat.c: In function âtest__synthesize_statâ:
tests/stat.c:82:3: error: missing initializer for field âenaâ of
âstruct <anonymous>â [-Werror=missing-field-initializers]
.ena = 200,
^
In file included from tests/stat.c:5:0:
/home/lk/group_read/test/perf/tools/perf/util/counts.h:10:8:
note: âenaâ declared here
u64 ena;
^
tests/stat.c:83:3: error: missing initializer for field ârunâ of
âstruct <anonymous>â [-Werror=missing-field-initializers]
.run = 300,
^
In file included from tests/stat.c:5:0:
/home/lk/group_read/test/perf/tools/perf/util/counts.h:11:8:
note: ârunâ declared here
u64 run;
^
cc1: all warnings being treated as errors
mv: cannot stat âtests/.stat.o.tmpâ: No such file or directory
make[3]: *** [tests/stat.o] Error 1
make[2]: *** [tests] Error 2
make[1]: *** [perf-in.o] Error 2
make: *** [all] Error 2


2. it still doesn't work well with uncore event.

Here is an example:
sudo ./perf stat -e uncore_imc_1/cas_count_read/ -a --interval-print 100
--per-socket record -- sleep 1
# time socket cpus counts unit events
0.100231142 S0 1 0.04 MiB uncore_imc_1/cas_count_read/
0.100231142 S1 1 0.08 MiB uncore_imc_1/cas_count_read/
0.200583556 S0 1 0.02 MiB uncore_imc_1/cas_count_read/
0.200583556 S1 1 0.02 MiB uncore_imc_1/cas_count_read/

sudo ./perf stat report --per-socket
# time socket cpus counts unit events
0.100231142 S0 36 0.12 MiB uncore_imc_1/cas_count_read/
0.100231142 S1 28 <not counted>
MiB uncore_imc_1/cas_count_read/
0.200583556 S0 36 0.04 MiB uncore_imc_1/cas_count_read/
0.200583556 S1 28 <not counted>
MiB uncore_imc_1/cas_count_read/

The results are not consistent.

I checked the perf report -D result. It looks the wrong CPU id is stored.
In my system, the uncore event should be monitored in CPU 0 and CPU 19
But it shows CPU0 and CPU1.

0x520 [0x30]: PERF_RECORD_STAT
... id 22385, cpu 0, thread 0

0x550 [0x30]: PERF_RECORD_STAT
... id 22386, cpu 1, thread 0


3. I think you have already handled new record type for perf stat record.
But the "unhandled!" message still show in the perf report -D.

0x598 [0x30]: PERF_RECORD_STAT
... id 22385, cpu 0, thread 0
... value 1752, enabled 501473097, running 501473097
: unhandled! <-------


4. Core dump when applying --per-core -a for stat record.

sudo ./perf stat -e 'cycles' --per-core -a record sleep 1

** Error in `./perf': corrupted double-linked list: 0x00000000036eb560 **
======= Backtrace: =========
/lib64/libc.so.6[0x3002e7bbe7]
/lib64/libc.so.6[0x3002e7d2b5]
./perf(perf_env__exit+0xfd)[0x484d1d]
./perf[0x47c45f]
./perf(main+0x610)[0x421ac0]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x3002e21b45]
./perf[0x421bd9]


5. perf stat record may not monitor on specific CPU/whole-system.
However, perf stat report can be set to aggregate per core/socket.
If so, there is problem to show the data.
I think we need some warning here.

For example,

sudo ./perf stat -e 'cycles' record sleep 1

Performance counter stats for 'sleep 1':

858,928 cycles

1.001120735 seconds time elapsed

sudo ./perf stat report -A

Performance counter stats for '/home/lk/group_read/test/perf/
tools/perf/perf stat -e cycles record sleep 1':

CPU-1 858,928 cycles

1.001120735 seconds time elapsed



Thanks,
Kan

> hi,
> sending another version of stat scripting.
>
> v3 changes:
> - added attr update event to handle unit,scale,name for event
> it fixed the uncore_imc_1/cas_count_read/ record/report
> - perf report -D now displays stat related events
> - some minor and changelog fixes
>
> v2 changes:
> - rebased to latest Arnaldo's perf/core
> - patches 1 to 11 already merged in
> - added --per-core/--per-socket/-A options for perf stat report
> command to allow custom aggregation in stat report, please
> check new examples below
> - couple changelogs changes
>
> The initial attempt defined its own formula lang and allowed triggering
> user's script on the end of the stat command:
> http://marc.info/?l=linux-kernel&m=136742146322273&w=2
>
> This patchset abandons the idea of new formula language and rather adds
> support to:
> - store stat data into perf.data file
> - add python support to process stat events
>
> Basically it allows to store stat data into perf.data and post process it with
> python scripts in a similar way we do for sampling data.
>
> The stat data are stored in new stat, stat-round, stat-config user events.
> stat - stored for each read syscall of the counter
> stat round - stored for each interval or end of the command invocation
> stat config - stores all the config information needed to process data
> so report tool could restore the same output as record
>
> The python script can now define 'stat__<eventname>_<modifier>'
> functions to get stat events data and 'stat__interval' to get stat-round data.
>
> See CPI script example in scripts/python/stat-cpi.py.
>
> Also available in:
> git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
> perf/stat_script
>
> thanks,
> jirka
>
>
> Examples:
>
> - To record data for command stat workload:
>
> $ perf stat record kill
> ...
>
> Performance counter stats for 'kill':
>
> 0.372007 task-clock (msec) # 0.613 CPUs utilized
> 3 context-switches # 0.008 M/sec
> 0 cpu-migrations # 0.000 K/sec
> 62 page-faults # 0.167 M/sec
> 1,129,973 cycles # 3.038 GHz
> <not supported> stalled-cycles-frontend
> <not supported> stalled-cycles-backend
> 813,313 instructions # 0.72 insns per cycle
> 166,161 branches # 446.661 M/sec
> 8,747 branch-misses # 5.26% of all branches
>
> 0.000607287 seconds time elapsed
>
> - To report perf stat data:
>
> $ perf stat report
>
> Performance counter stats for '/home/jolsa/bin/perf stat record kill':
>
> 0.372007 task-clock (msec) # inf CPUs utilized
> 3 context-switches # 0.008 M/sec
> 0 cpu-migrations # 0.000 K/sec
> 62 page-faults # 0.167 M/sec
> 1,129,973 cycles # 3.038 GHz
> <not supported> stalled-cycles-frontend
> <not supported> stalled-cycles-backend
> 813,313 instructions # 0.72 insns per cycle
> 166,161 branches # 446.661 M/sec
> 8,747 branch-misses # 5.26% of all branches
>
> 0.000000000 seconds time elapsed
>
> - To store system-wide period stat data:
>
> $ perf stat -e cycles:u,instructions:u -a -I 1000 record
> # time counts unit events
> 1.000265471 462,311,482 cycles:u (100.00%)
> 1.000265471 590,037,440 instructions:u
> 2.000483453 722,532,336 cycles:u (100.00%)
> 2.000483453 848,678,197 instructions:u
> 3.000759876 75,990,880 cycles:u (100.00%)
> 3.000759876 86,187,813 instructions:u
> ^C 3.213960893 85,329,533 cycles:u (100.00%)
> 3.213960893 135,954,296 instructions:u
>
> - To report perf stat data:
>
> $ perf stat report
> # time counts unit events
> 1.000265471 462,311,482 cycles:u (100.00%)
> 1.000265471 590,037,440 instructions:u
> 2.000483453 722,532,336 cycles:u (100.00%)
> 2.000483453 848,678,197 instructions:u
> 3.000759876 75,990,880 cycles:u (100.00%)
> 3.000759876 86,187,813 instructions:u
> 3.213960893 85,329,533 cycles:u (100.00%)
> 3.213960893 135,954,296 instructions:u
>
> - To run stat-cpi.py script over perf.data:
>
> $ perf script -s scripts/python/stat-cpi.py
> 1.000265: cpu -1, thread -1 -> cpi 0.783529 (462311482/590037440)
> 2.000483: cpu -1, thread -1 -> cpi 0.851362 (722532336/848678197)
> 3.000760: cpu -1, thread -1 -> cpi 0.881689 (75990880/86187813)
> 3.213961: cpu -1, thread -1 -> cpi 0.627634 (85329533/135954296)
>
> - To pipe data from stat to stat-cpi script:
>
> $ perf stat -e cycles:u,instructions:u -A -C 0 -I 1000 record | perf script -s
> scripts/python/stat-cpi.py
> 1.000192: cpu 0, thread -1 -> cpi 0.739535 (23921908/32347236)
> 2.000376: cpu 0, thread -1 -> cpi 1.663482 (2519340/1514498)
> 3.000621: cpu 0, thread -1 -> cpi 1.396308 (16162767/11575362)
> 4.000700: cpu 0, thread -1 -> cpi 1.092246 (20077258/18381624)
> 5.000867: cpu 0, thread -1 -> cpi 0.473816 (45157586/95306156)
> 6.001034: cpu 0, thread -1 -> cpi 0.532792 (43701668/82023818)
> 7.001195: cpu 0, thread -1 -> cpi 1.122059 (29890042/26638561)
>
> - Raw script stat data output:
>
> $ perf stat -e cycles:u,instructions:u -A -C 0 -I 1000 record | perf --no-
> pager script
> CPU THREAD VAL ENA RUN TIME EVENT
> 0 -1 12302059 1000811347 1000810712 1000198821 cycles:u
> 0 -1 2565362 1000823218 1000823218 1000198821
> instructions:u
> 0 -1 14453353 1000812704 1000812704 2000382283 cycles:u
> 0 -1 4600932 1000799342 1000799342 2000382283
> instructions:u
> 0 -1 15245106 1000774425 1000774425 3000538255 cycles:u
> 0 -1 2624324 1000769310 1000769310 3000538255
> instructions:u
>
> - To display different aggregation in report:
>
> $ perf stat -e cycles -a -I 1000 record sleep 3
> # time counts unit events
> 1.000223609 703,427,617 cycles
> 2.000443651 609,975,307 cycles
> 3.000569616 668,479,597 cycles
> 3.000735323 1,155,816 cycles
>
> $ perf stat report
> # time counts unit events
> 1.000223609 703,427,617 cycles
> 2.000443651 609,975,307 cycles
> 3.000569616 668,479,597 cycles
> 3.000735323 1,155,816 cycles
>
> $ perf stat report --per-core
> # time core cpus counts unit events
> 1.000223609 S0-C0 2 327,612,412 cycles
> 1.000223609 S0-C1 2 375,815,205 cycles
> 2.000443651 S0-C0 2 287,462,177 cycles
> 2.000443651 S0-C1 2 322,513,130 cycles
> 3.000569616 S0-C0 2 271,571,908 cycles
> 3.000569616 S0-C1 2 396,907,689 cycles
> 3.000735323 S0-C0 2 694,977 cycles
> 3.000735323 S0-C1 2 460,839 cycles
>
> $ perf stat report --per-socket
> # time socket cpus counts unit events
> 1.000223609 S0 4 703,427,617 cycles
> 2.000443651 S0 4 609,975,307 cycles
> 3.000569616 S0 4 668,479,597 cycles
> 3.000735323 S0 4 1,155,816 cycles
>
> $ perf stat report -A
> # time CPU counts unit events
> 1.000223609 CPU0 205,431,505 cycles
> 1.000223609 CPU1 122,180,907 cycles
> 1.000223609 CPU2 176,649,682 cycles
> 1.000223609 CPU3 199,165,523 cycles
> 2.000443651 CPU0 148,447,922 cycles
> 2.000443651 CPU1 139,014,255 cycles
> 2.000443651 CPU2 204,436,559 cycles
> 2.000443651 CPU3 118,076,571 cycles
> 3.000569616 CPU0 149,788,954 cycles
> 3.000569616 CPU1 121,782,954 cycles
> 3.000569616 CPU2 247,277,700 cycles
> 3.000569616 CPU3 149,629,989 cycles
> 3.000735323 CPU0 269,675 cycles
> 3.000735323 CPU1 425,302 cycles
> 3.000735323 CPU2 364,169 cycles
> 3.000735323 CPU3 96,670 cycles
>
>
> Cc: Andi Kleen <andi@xxxxxxxxxxxxxx>
> Cc: Ulrich Drepper <drepper@xxxxxxxxx>
> Cc: Will Deacon <will.deacon@xxxxxxx>
> Cc: Stephane Eranian <eranian@xxxxxxxxxx>
> Cc: Don Zickus <dzickus@xxxxxxxxxx>
> ---
> Jiri Olsa (56):
> perf tools: Add thread_map event
> perf tools: Add thread_map event sythesize function
> perf tools: Add thread_map__new_event function
> perf tools: Add perf_event__fprintf_thread_map function
> perf tools: Add cpu_map event
> perf tools: Add cpu_map event synthesize function
> perf tools: Add cpu_map__new_event function
> perf tools: Add perf_event__fprintf_cpu_map function
> perf tools: Add stat config event
> perf tools: Add stat config event synthesize function
> perf tools: Add stat config event read function
> perf tools: Add stat event
> perf tools: Add stat event synthesize function
> perf tools: Add stat event read function
> perf tools: Add stat round event
> perf tools: Add stat round event synthesize function
> perf tools: Add stat events fprintf functions
> perf tools: Add attr_update event
> perf tools: Add attr_update event unit type
> perf tools: Add attr_update event scale type
> perf tools: Add attr_update event name type
> perf tools: Add perf_event__fprintf_attr_update function
> perf report: Display newly added events in raw dump
> perf tools: Introduce stat feature
> perf tools: Move id_offset out of struct perf_evsel union
> perf stat: Rename perf_stat struct into perf_stat_evsel
> perf stat: Add AGGR_UNSET mode
> perf stat record: Add record command
> perf stat record: Initialize record features
> perf stat record: Synthesize stat record data
> perf stat record: Store events IDs in perf data file
> perf stat record: Add pipe support for record command
> perf stat record: Write stat events on record
> perf stat record: Write stat round events on record
> perf stat record: Do not allow record with multiple runs mode
> perf stat record: Synthesize attr update events
> perf tools: Add cpu_map__empty_new interface
> perf tools: Make cpu_map__build_map global
> perf tools: Add data arg to cpu_map__build_map callback
> perf stat report: Cache aggregated map entries in extra cpumap
> perf stat report: Add report command
> perf stat report: Process cpu/threads maps
> perf stat report: Process stat config event
> perf stat report: Add support to initialize aggr_map from file
> perf stat report: Process stat and stat round events
> perf stat report: Process attr update events
> perf stat report: Move csv_sep initialization before report command
> perf stat report: Allow to override aggr_mode
> perf script: Check output fields only for samples
> perf script: Process cpu/threads maps
> perf script: Process stat config event
> perf script: Add process_stat/process_stat_interval scripting interface
> perf script: Add stat default handlers
> perf script: Display stat events by default
> perf script: Add python support for stat events
> perf script: Add stat-cpi.py script
>
> tools/perf/Documentation/perf-stat.txt | 34 ++
> tools/perf/builtin-record.c | 2 +
> tools/perf/builtin-script.c | 144 ++++-
> tools/perf/builtin-stat.c | 636 ++++++++++++++++++++-
> tools/perf/scripts/python/stat-cpi.py | 74 +++
> tools/perf/tests/Build | 3 +
> tools/perf/tests/attr_update.c | 82 +++
> tools/perf/tests/builtin-test.c | 25 +
> tools/perf/tests/cpumap.c | 39 ++
> tools/perf/tests/stat.c | 111 ++++
> tools/perf/tests/tests.h | 7 +
> tools/perf/tests/thread-map.c | 43 ++
> tools/perf/tests/topology.c | 4 +-
> tools/perf/util/cpumap.c | 61 +-
> tools/perf/util/cpumap.h | 11 +-
> tools/perf/util/event.c | 205 +++++++
> tools/perf/util/event.h | 122 +++-
> tools/perf/util/evlist.c | 6 +-
> tools/perf/util/evlist.h | 3 +
> tools/perf/util/evsel.h | 2 +-
> tools/perf/util/header.c | 146 +++++
> tools/perf/util/header.h | 14 +
> .../util/scripting-engines/trace-event-python.c | 114 +++-
> tools/perf/util/session.c | 163 ++++++
> tools/perf/util/stat.c | 75 ++-
> tools/perf/util/stat.h | 13 +-
> tools/perf/util/thread_map.c | 27 +
> tools/perf/util/thread_map.h | 3 +
> tools/perf/util/tool.h | 8 +-
> tools/perf/util/trace-event.h | 4 +
> 30 files changed, 2132 insertions(+), 49 deletions(-) create mode 100644
> tools/perf/scripts/python/stat-cpi.py
> create mode 100644 tools/perf/tests/attr_update.c create mode 100644
> tools/perf/tests/cpumap.c create mode 100644 tools/perf/tests/stat.c
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/