[6.1.7][6.2-rc5] perf all metrics test: FAILED!

From: Sedat Dilek
Date: Sun Jan 29 2023 - 04:59:32 EST


[ CC LLVM linux folks + Ben from Debian kernel team ]

Hi,

I am playing with LLVM version 16.0.0-rc1 which was released yesterday and PERF.

After building my selfmade LLVM toolchain, I built perf and run some
perf tests here on my Intel SandyBridge CPU (details see below).

perf all metrics test: FAILED!

...with both Debian's perf version 6.1.7 and my selfmade version 6.2-rc5.

Just noticed:

Couldn't bump rlimit(MEMLOCK), failures may take place when creating
BPF maps, etc

Run the below tests with `sudo` - made this go away - still FAILED.

But maybe I am missing to activate some sysfs/debug or whatever other stuff?

Last perf version which was OK:

~/bin/perf -v
perf version 6.0.0

echo "linux-perf: Adjust limited access to performance monitoring and
observability operations"
echo 0 | sudo tee /proc/sys/kernel/kptr_restrict
/proc/sys/kernel/perf_event_paranoid
0

~/bin/perf test 10 86 92 93 94 95
10: PMU events :
10.1: PMU event table sanity : Ok
10.2: PMU event map aliases : Ok
10.3: Parsing of PMU event table metrics : Ok
10.4: Parsing of PMU event table metrics with fake PMUs : Ok
86: perf record tests : Ok
92: perf stat tests : Ok
93: perf all metricgroups test : Ok
94: perf all metrics test : Ok
95: perf all PMU test : Ok

echo 1 | sudo tee /proc/sys/kernel/kptr_restrict
/proc/sys/kernel/perf_event_paranoid
echo "linux-perf: Reset limited access to performance monitoring and
observability operations"

If you need further information, please let me know.

Thanks.

Regards,
-Sedat-

P.S. Instructions

[ REPRODUCER ]

LLVM_MVER="16"

# Debian LLVM
##LLVM_TOOLCHAIN_PATH="/usr/lib/llvm-${LLVM_MVER}/bin"
# Selfmade LLVM
LLVM_TOOLCHAIN_PATH="/opt/llvm/bin"
if [ -d ${LLVM_TOOLCHAIN_PATH} ]; then
export PATH="${LLVM_TOOLCHAIN_PATH}:${PATH}"
fi

PYTHON_VER="3.11"
MAKE="make"
MAKE_OPTS="V=1 -j1 HOSTCC=clang-$LLVM_MVER HOSTLD=ld.lld
HOSTAR=llvm-ar CC=clang-$LLVM_MVER LD=ld.lld AR=llvm-ar
STRIP=llvm-strip"

echo "LLVM MVER ........ $LLVM_MVER"
echo "Path settings .... $PATH"
echo "Python version ... $PYTHON_VER"
echo "make line ........ $MAKE $MAKE_OPTS"

LANG=C LC_ALL=C make -C tools/perf clean 2>&1 | tee ../make-log_perf-clean.txt

LANG=C LC_ALL=C $MAKE $MAKE_OPTS -C tools/perf
PYTHON=python${PYTHON_VER} install-bin 2>&1 | tee
../make-log_perf-install_bin_python${PYTHON_VER}_llvm${LLVM_MVER}.txt


[ TESTS ]

[ TESTS - START ]

echo 0 | sudo tee /proc/sys/kernel/kptr_restrict
/proc/sys/kernel/perf_event_paranoid

[ TESTS - DEBIAN ]

/usr/bin/perf -v
perf version 6.1.7

/usr/bin/perf test 10 92 98 99 100 101

10: PMU events :
10.1: PMU event table sanity : Ok
10.2: PMU event map aliases : Ok
10.3: Parsing of PMU event table metrics : Ok
10.4: Parsing of PMU event table metrics with fake PMUs : Ok
92: perf record tests : Ok
98: perf stat tests : Ok
99: perf all metricgroups test : Ok
100: perf all metrics test : FAILED!
101: perf all PMU test : Ok

[ TESTS - DILEKS ]

~/bin/perf -v
perf version 6.2.0-rc5

~/bin/perf test 7 87 93 94 95 96

7: PMU events :
7.1: PMU event table sanity : Ok
7.2: PMU event map aliases : Ok
7.3: Parsing of PMU event table metrics : Ok
7.4: Parsing of PMU event table metrics with fake PMUs : Ok
87: perf record tests : Ok
93: perf stat tests : Ok
94: perf all metricgroups test : Ok
95: perf all metrics test : FAILED!
96: perf all PMU test : Ok

[ TESTS - FAILED ]

/usr/bin/perf test --verbose 100 2>&1 | tee
perf-test-verbose-100-perf-all-metrics-test_debian-perf-6-1-7.txt

~/bin/perf test --verbose 95 2>&1 | tee
perf-test-verbose-95-perf-all-metrics-test_dileks-perf-6-2-rc5.txt

[ TESTS - STOP ]

echo 1 | sudo tee /proc/sys/kernel/kptr_restrict
/proc/sys/kernel/perf_event_paranoid

- EOT -
Couldn't bump rlimit(MEMLOCK), failures may take place when creating BPF maps, etc
100: perf all metrics test :
--- start ---
test child forked, pid 39432
Testing Average_Frequency
Testing C2_Pkg_Residency
Testing C3_Core_Residency
Testing C3_Pkg_Residency
Testing C6_Core_Residency
Testing C6_Pkg_Residency
Testing C7_Core_Residency
Testing C7_Pkg_Residency
Testing CLKS
Testing CORE_CLKS
Testing CPI
Testing CPU_Utilization
Testing CoreIPC
Testing DRAM_BW_Use
Testing DSB_Coverage
Testing Execute_per_Issue
Testing FLOPc
Testing GFLOPs
Testing ILP
Testing IPC
Testing Instructions
Testing IpFarBranch
Testing Kernel_CPI
Testing Kernel_Utilization
Testing MEM_Parallel_Requests
Testing MEM_Request_Latency
Testing Retire
Testing SLOTS
Testing SMT_2T_Utilization
Testing Turbo_Utilization
Testing UPI
Testing tma_backend_bound
Testing tma_bad_speculation
Testing tma_branch_mispredicts
Testing tma_branch_resteers
Testing tma_core_bound
Testing tma_divider
Testing tma_dram_bound
Metric 'tma_dram_bound' not printed in:
# Running 'internals/synthesize' benchmark:
Computing performance of single threaded perf event synthesis by
synthesizing events on the perf process itself:
Average synthesis took: 203.922 usec (+- 0.191 usec)
Average num. events: 30.000 (+- 0.000)
Average time per event 6.797 usec
Average data synthesis took: 219.730 usec (+- 0.216 usec)
Average num. events: 159.000 (+- 0.000)
Average time per event 1.382 usec

Performance counter stats for 'perf bench internals synthesize':

<not counted> MEM_LOAD_UOPS_RETIRED.LLC_HIT (0,00%)
<not counted> CYCLE_ACTIVITY.STALLS_L2_PENDING (0,00%)
<not counted> CPU_CLK_UNHALTED.THREAD (0,00%)
<not counted> MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS (0,00%)

4,456375532 seconds time elapsed

1,415829000 seconds user
3,027083000 seconds sys
Testing tma_dsb_switches
Testing tma_dtlb_load
Testing tma_fetch_bandwidth
Testing tma_fetch_latency
Testing tma_fp_arith
Testing tma_fp_scalar
Testing tma_fp_vector
Testing tma_frontend_bound
Testing tma_heavy_operations
Testing tma_itlb_misses
Testing tma_l3_bound
Metric 'tma_l3_bound' not printed in:
# Running 'internals/synthesize' benchmark:
Computing performance of single threaded perf event synthesis by
synthesizing events on the perf process itself:
Average synthesis took: 204.199 usec (+- 0.228 usec)
Average num. events: 30.000 (+- 0.000)
Average time per event 6.807 usec
Average data synthesis took: 219.934 usec (+- 0.232 usec)
Average num. events: 159.000 (+- 0.000)
Average time per event 1.383 usec

Performance counter stats for 'perf bench internals synthesize':

<not counted> MEM_LOAD_UOPS_RETIRED.LLC_HIT (0,00%)
<not counted> CYCLE_ACTIVITY.STALLS_L2_PENDING (0,00%)
<not counted> CPU_CLK_UNHALTED.THREAD (0,00%)
<not counted> MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS (0,00%)

4,458943453 seconds time elapsed

1,468251000 seconds user
2,976400000 seconds sys
Testing tma_lcp
Testing tma_light_operations
Testing tma_machine_clears
Testing tma_mem_bandwidth
Testing tma_mem_latency
Testing tma_memory_bound
Testing tma_microcode_sequencer
Testing tma_ms_switches
Testing tma_ports_utilization
Testing tma_retiring
Testing tma_store_bound
Testing tma_x87_use
test child finished with -1
---- end ----
perf all metrics test: FAILED!
Couldn't bump rlimit(MEMLOCK), failures may take place when creating BPF maps, etc
95: perf all metrics test :
--- start ---
test child forked, pid 39198
Testing ILP
Testing tma_core_bound
Testing tma_memory_bound
Testing tma_branch_mispredicts
Testing tma_machine_clears
Testing tma_itlb_misses
Testing IpFarBranch
Testing tma_l3_bound
Metric 'tma_l3_bound' not printed in:
# Running 'internals/synthesize' benchmark:
Computing performance of single threaded perf event synthesis by
synthesizing events on the perf process itself:
Average synthesis took: 208.033 usec (+- 0.214 usec)
Average num. events: 30.000 (+- 0.000)
Average time per event 6.934 usec
Average data synthesis took: 216.728 usec (+- 0.182 usec)
Average num. events: 162.000 (+- 0.000)
Average time per event 1.338 usec

Performance counter stats for 'perf bench internals synthesize':

<not counted> MEM_LOAD_UOPS_RETIRED.LLC_HIT (0,00%)
<not counted> CYCLE_ACTIVITY.STALLS_L2_PENDING (0,00%)
<not counted> CPU_CLK_UNHALTED.THREAD (0,00%)
<not counted> MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS (0,00%)

4,555228480 seconds time elapsed

1,504137000 seconds user
3,040193000 seconds sys
Testing tma_fp_scalar
Testing tma_fp_vector
Testing tma_x87_use
Testing Execute_per_Issue
Testing GFLOPs
Testing DSB_Coverage
Testing tma_dsb_switches
Testing tma_fetch_bandwidth
Testing tma_branch_resteers
Testing tma_lcp
Testing tma_ms_switches
Testing FLOPc
Testing tma_fetch_latency
Testing CPU_Utilization
Testing DRAM_BW_Use
Testing tma_fp_arith
Testing CPI
Testing MEM_Parallel_Requests
Testing MEM_Request_Latency
Testing tma_mem_bandwidth
Testing tma_dram_bound
Metric 'tma_dram_bound' not printed in:
# Running 'internals/synthesize' benchmark:
Computing performance of single threaded perf event synthesis by
synthesizing events on the perf process itself:
Average synthesis took: 207.680 usec (+- 0.176 usec)
Average num. events: 30.000 (+- 0.000)
Average time per event 6.923 usec
Average data synthesis took: 217.833 usec (+- 0.202 usec)
Average num. events: 161.000 (+- 0.000)
Average time per event 1.353 usec

Performance counter stats for 'perf bench internals synthesize':

<not counted> MEM_LOAD_UOPS_RETIRED.LLC_HIT (0,00%)
<not counted> CYCLE_ACTIVITY.STALLS_L2_PENDING (0,00%)
<not counted> CPU_CLK_UNHALTED.THREAD (0,00%)
<not counted> MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS (0,00%)

4,555698863 seconds time elapsed

1,481769000 seconds user
3,063387000 seconds sys
Testing tma_store_bound
Testing tma_mem_latency
Testing tma_dtlb_load
Testing tma_microcode_sequencer
Testing Kernel_CPI
Testing Kernel_Utilization
Testing tma_frontend_bound
Testing CLKS
Testing Retire
Testing UPI
Testing tma_ports_utilization
Testing Average_Frequency
Testing C2_Pkg_Residency
Testing C3_Core_Residency
Testing C3_Pkg_Residency
Testing C6_Core_Residency
Testing C6_Pkg_Residency
Testing C7_Core_Residency
Testing C7_Pkg_Residency
Testing Turbo_Utilization
Testing CoreIPC
Testing IPC
Testing tma_heavy_operations
Testing tma_light_operations
Testing CORE_CLKS
Testing SMT_2T_Utilization
Testing Socket_CLKS
Testing UNCORE_FREQ
Testing Instructions
Testing tma_backend_bound
Testing tma_bad_speculation
Testing tma_retiring
Testing tma_divider
Testing SLOTS
test child finished with -1
---- end ----
perf all metrics test: FAILED!