perf tool: About tests debug level

From: John Garry
Date: Mon Jun 21 2021 - 07:26:47 EST


Hi guys,

I noticed that when it was discovered recently that the new icelake JSON did not pass "PMU events" test, running the test with -v makes pr_err() and pr_debug() come out at the same level, so it's hard to distinguish the important logs.

Here is a link:
https://lore.kernel.org/lkml/YLdq%2FH8CXYgHWzCL@xxxxxxxxxx/

And here is an extract:

parsing '(cpu_clk_unhalted.thread / cpu_clk_unhalted.ref_tsc) * msr@tsc@ / 1000000000 / duration_time'
parsing '( ( 1 * ( fp_arith_inst_retired.scalar_single + fp_arith_inst_retired.scalar_double ) + 2 * fp_arith_inst_retired.128b_packed_double + 4 * ( fp_arith_inst_retired.128b_packed_single + fp_arith_inst_retired.256b_packed_double ) + 8 * ( fp_arith_inst_retired.256b_packed_single + fp_arith_inst_retired.512b_packed_double ) + 16 * fp_arith_inst_retired.512b_packed_single ) / 1000000000 ) / duration_time'
parsing 'cpu_clk_unhalted.thread / cpu_clk_unhalted.ref_tsc'
parsing '1 - cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_distributed'
parsing 'cpu_clk_unhalted.thread:k / cpu_clk_unhalted.thread'
parsing '( 64 * ( uncore_imc@cas_count_read@ + uncore_imc@cas_count_write@ ) / 1000000000 ) / duration_time'
parsing '1000000000 * ( cha@event\=0x36\,umask\=0x21\,config\=0x40433@ / cha@event\=0x35\,umask\=0x21\,config\=0x40433@ ) / ( cha_0@event\=0x0@ / duration_time )'
parsing 'cha@event\=0x36\,umask\=0x21\,config\=0x40433@ / cha@event\=0x36\,umask\=0x21\,config\=0x40433\,thresh\=1@'
parsing '( 1000000000 * ( cha@event\=0x36\,umask\=0x21\,config\=0x40433@_pmm / cha@event\=0x35\,umask\=0x21\,config\=0x40433@_pmm ) / cha_0@event\=0x0@ )' [pr_debug]
check_parse_fake failed [pr_err]
test child finished with -1
---- end ----

I annotated in [], above

As for another issue, if you consider "Parse and process metrics", debug from core code comes out at same level as test code, e.g. with -v, we see pr_debug() from test code and core code. Again, this makes it hard to distinguish various levels. As an illustration, here I hack the code to fail a test:

sudo ./perf test -v 68
68: Parse and process metrics :
--- start ---
test child forked, pid 9747
metric expr inst_retired.any / cpu_clk_unhalted.thread for IPC
found event inst_retired.any verbose=1 pr_debug
found event inst_retired.any verbose=1 pr_err
found event cpu_clk_unhalted.thread verbose=1 pr_debug
found event cpu_clk_unhalted.thread verbose=1 pr_err
adding {inst_retired.any,cpu_clk_unhalted.thread}:W
FAILED tests/parse-metric.c:223 IPC failed, wrong ratio
FAILED tests/parse-metric.c:374 IPC failed
test child finished with -1
---- end ----
Parse and process metrics: FAILED!

Note that the "FAILED" messages from the test code come from pr_debug().

In a way, I feel that pr_debug()/err from the test is more important than pr_debug() from the core code (when running a test).

Any opinion on this or how to improve (if anyone agrees with me)? Or am I missing something? Or is it not so important?

Thanks,
John