Re: [PATCH 1/2] arm_pmu: fix event CPU filtering

From: Janne Grunau
Date: Thu Feb 16 2023 - 09:35:37 EST


On 2023-02-16 14:12:38 +0000, Mark Rutland wrote:
> Janne reports that perf has been broken on Apple M1 as of commit:
>
> bd27568117664b8b ("perf: Rewrite core context handling")
>
> That commit replaced the pmu::filter_match() callback with
> pmu::filter(), whose return value has the opposite polarity, with true
> implying events should be ignored rather than scheduled. While an
> attempt was made to update the logic in armv8pmu_filter() and
> armpmu_filter() accordingly, the return value remains inverted in a
> couple of cases:
>
> * If the arm_pmu does not have an arm_pmu::filter() callback,
> armpmu_filter() will always return whether the CPU is supported rather
> than whether the CPU is not supported.
>
> As a result, the perf core will not schedule events on supported CPUs,
> resulting in a loss of events. Additionally, the perf core will
> attempt to schedule events on unsupported CPUs, but this will be
> rejected by armpmu_add(), which may result in a loss of events from
> other PMUs on those unsupported CPUs.
>
> * If the arm_pmu does have an arm_pmu::filter() callback, and
> armpmu_filter() is called on a CPU which is not supported by the
> arm_pmu, armpmu_filter() will return false rather than true.
>
> As a result, the perf core will attempt to schedule events on
> unsupported CPUs, but this will be rejected by armpmu_add(), which may
> result in a loss of events from other PMUs on those unsupported CPUs.
>
> This means a loss of events can be seen with any arm_pmu driver, but
> with the ARMv8 PMUv3 driver (which is the only arm_pmu driver with an
> arm_pmu::filter() callback) the event loss will be more limited and may
> go unnoticed, which is how this issue evaded testing so far.
>
> Fix the CPU filtering by performing this consistently in
> armpmu_filter(), and remove the redundant arm_pmu::filter() callback and
> armv8pmu_filter() implementation.
>
> Commit bd2756811766 also silently removed the CHAIN event filtering from
> armv8pmu_filter(), which will be addressed by a separate patch without
> using the filter callback.
>
> Fixes: bd27568117664b8b ("perf: Rewrite core context handling")
> Reported-by: Janne Grunau <j@xxxxxxxxxx>
> Link: https://lore.kernel.org/asahi/20230215-arm_pmu_m1_regression-v1-1-f5a266577c8d@xxxxxxxxxx/
> Signed-off-by: Mark Rutland <mark.rutland@xxxxxxx>
> Cc: Will Deacon <will@xxxxxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Cc: Ravi Bangoria <ravi.bangoria@xxxxxxx>
> Cc: Asahi Lina <lina@xxxxxxxxxxxxx>
> Cc: Eric Curtin <ecurtin@xxxxxxxxxx>
> ---
> arch/arm64/kernel/perf_event.c | 7 -------
> drivers/perf/arm_pmu.c | 8 +-------
> include/linux/perf/arm_pmu.h | 1 -
> 3 files changed, 1 insertion(+), 15 deletions(-)
>
> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> index a5193f2146a6..3e43538f6b72 100644
> --- a/arch/arm64/kernel/perf_event.c
> +++ b/arch/arm64/kernel/perf_event.c
> @@ -1023,12 +1023,6 @@ static int armv8pmu_set_event_filter(struct hw_perf_event *event,
> return 0;
> }
>
> -static bool armv8pmu_filter(struct pmu *pmu, int cpu)
> -{
> - struct arm_pmu *armpmu = to_arm_pmu(pmu);
> - return !cpumask_test_cpu(smp_processor_id(), &armpmu->supported_cpus);
> -}
> -
> static void armv8pmu_reset(void *info)
> {
> struct arm_pmu *cpu_pmu = (struct arm_pmu *)info;
> @@ -1258,7 +1252,6 @@ static int armv8_pmu_init(struct arm_pmu *cpu_pmu, char *name,
> cpu_pmu->stop = armv8pmu_stop;
> cpu_pmu->reset = armv8pmu_reset;
> cpu_pmu->set_event_filter = armv8pmu_set_event_filter;
> - cpu_pmu->filter = armv8pmu_filter;
>
> cpu_pmu->pmu.event_idx = armv8pmu_user_event_idx;
>
> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
> index 9b593f985805..40f70f83daba 100644
> --- a/drivers/perf/arm_pmu.c
> +++ b/drivers/perf/arm_pmu.c
> @@ -550,13 +550,7 @@ static void armpmu_disable(struct pmu *pmu)
> static bool armpmu_filter(struct pmu *pmu, int cpu)
> {
> struct arm_pmu *armpmu = to_arm_pmu(pmu);
> - bool ret;
> -
> - ret = cpumask_test_cpu(cpu, &armpmu->supported_cpus);
> - if (ret && armpmu->filter)
> - return armpmu->filter(pmu, cpu);
> -
> - return ret;
> + return !cpumask_test_cpu(cpu, &armpmu->supported_cpus);
> }
>
> static ssize_t cpus_show(struct device *dev,
> diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
> index ef914a600087..525b5d64e394 100644
> --- a/include/linux/perf/arm_pmu.h
> +++ b/include/linux/perf/arm_pmu.h
> @@ -100,7 +100,6 @@ struct arm_pmu {
> void (*stop)(struct arm_pmu *);
> void (*reset)(void *);
> int (*map_event)(struct perf_event *event);
> - bool (*filter)(struct pmu *pmu, int cpu);
> int num_events;
> bool secure_access; /* 32-bit ARM only */
> #define ARMV8_PMUV3_MAX_COMMON_EVENTS 0x40

This works as well. I limited the patch to the minimal fix this
this late in the cycle.

Tested-by: Janne Grunau <j@xxxxxxxxxx>

thanks,
Janne