Re: [PATCH] perf mem/c2c: Document that SPE is used for mem and c2c on Arm

From: Arnaldo Carvalho de Melo
Date: Tue Jan 24 2023 - 13:37:20 EST


Em Tue, Jan 24, 2023 at 02:59:29PM +0000, James Clark escreveu:
> Setup is non-trivial so also link to the full SPE docs.

Thanks, applied.

- Arnaldo


> Signed-off-by: James Clark <james.clark@xxxxxxx>
> ---
> tools/perf/Documentation/perf-c2c.txt | 8 ++++++--
> tools/perf/Documentation/perf-mem.txt | 7 ++++++-
> 2 files changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/tools/perf/Documentation/perf-c2c.txt b/tools/perf/Documentation/perf-c2c.txt
> index af5c3106f468..4e8c263e1721 100644
> --- a/tools/perf/Documentation/perf-c2c.txt
> +++ b/tools/perf/Documentation/perf-c2c.txt
> @@ -22,7 +22,11 @@ you to track down the cacheline contentions.
> On Intel, the tool is based on load latency and precise store facility events
> provided by Intel CPUs. On PowerPC, the tool uses random instruction sampling
> with thresholding feature. On AMD, the tool uses IBS op pmu (due to hardware
> -limitations, perf c2c is not supported on Zen3 cpus).
> +limitations, perf c2c is not supported on Zen3 cpus). On Arm64 it uses SPE to
> +sample load and store operations, therefore hardware and kernel support is
> +required. See linkperf:perf-arm-spe[1] for a setup guide. Due to the
> +statistical nature of Arm SPE sampling, not every memory operation will be
> +sampled.
>
> These events provide:
> - memory address of the access
> @@ -333,4 +337,4 @@ Check Joe's blog on c2c tool for detailed use case explanation:
>
> SEE ALSO
> --------
> -linkperf:perf-record[1], linkperf:perf-mem[1]
> +linkperf:perf-record[1], linkperf:perf-mem[1], linkperf:perf-arm-spe[1]
> diff --git a/tools/perf/Documentation/perf-mem.txt b/tools/perf/Documentation/perf-mem.txt
> index 005c95580b1e..19862572e3f2 100644
> --- a/tools/perf/Documentation/perf-mem.txt
> +++ b/tools/perf/Documentation/perf-mem.txt
> @@ -23,6 +23,11 @@ Note that on Intel systems the memory latency reported is the use-latency,
> not the pure load (or store latency). Use latency includes any pipeline
> queueing delays in addition to the memory subsystem latency.
>
> +On Arm64 this uses SPE to sample load and store operations, therefore hardware
> +and kernel support is required. See linkperf:perf-arm-spe[1] for a setup guide.
> +Due to the statistical nature of SPE sampling, not every memory operation will
> +be sampled.
> +
> OPTIONS
> -------
> <command>...::
> @@ -93,4 +98,4 @@ all perf record options.
>
> SEE ALSO
> --------
> -linkperf:perf-record[1], linkperf:perf-report[1]
> +linkperf:perf-record[1], linkperf:perf-report[1], linkperf:perf-arm-spe[1]
>
> base-commit: 5670ebf54bd26482f57a094c53bdc562c106e0a9
> --
> 2.39.1
>

--

- Arnaldo