Re: [PATCH] perf stat: Support per-cluster aggregation

From: Jie Zhan
Date: Thu Mar 23 2023 - 22:34:41 EST



On 13/03/2023 16:59, Yicong Yang wrote:
From: Yicong Yang <yangyicong@xxxxxxxxxxxxx>

Some platforms have 'cluster' topology and CPUs in the cluster will
share resources like L3 Cache Tag (for HiSilicon Kunpeng SoC) or L2
cache (for Intel Jacobsville). Currently parsing and building cluster
topology have been supported since [1].

perf stat has already supported aggregation for other topologies like
die or socket, etc. It'll be useful to aggregate per-cluster to find
problems like L3T bandwidth contention or imbalance.

This patch adds support for "--per-cluster" option for per-cluster
aggregation. Also update the docs and related test. The output will
be like:

[root@localhost tmp]# perf stat -a -e LLC-load --per-cluster -- sleep 5

Performance counter stats for 'system wide':

S56-D0-CLS158 4 1,321,521,570 LLC-load
S56-D0-CLS594 4 794,211,453 LLC-load
S56-D0-CLS1030 4 41,623 LLC-load
S56-D0-CLS1466 4 41,646 LLC-load
S56-D0-CLS1902 4 16,863 LLC-load
S56-D0-CLS2338 4 15,721 LLC-load
S56-D0-CLS2774 4 22,671 LLC-load
[...]

[1] commit c5e22feffdd7 ("topology: Represent clusters of CPUs within a die")

Signed-off-by: Yicong Yang <yangyicong@xxxxxxxxxxxxx>

An end user may have to check sysfs to figure out what CPUs those cluster IDs account for.

Any better method to show the mapping between CPUs and cluster IDs?

Perhaps adding a conditional cluster id (when there are clusters) in the "--per-core" output may help.

Apart form that, this works well on my aarch64.

Tested-by: Jie Zhan <zhanjie9@xxxxxxxxxxxxx>