Re: [PATCH V7 1/2] perf: Track per-PMU sched_task() callback users

From: Peter Zijlstra
Date: Thu May 13 2021 - 10:50:12 EST


On Thu, May 13, 2021 at 07:23:01AM -0700, kan.liang@xxxxxxxxxxxxxxx wrote:
> From: Kan Liang <kan.liang@xxxxxxxxxxxxxxx>
>
> Current perf only tracks the per-CPU sched_task() callback users, which
> doesn't work if a callback user is a task. For example, the dirty
> counters have to be cleared to prevent data leakage when a new RDPMC
> task is scheduled in. The task may be created on one CPU but running on
> another CPU. It cannot be tracked by the per-CPU variable. A global
> variable is not going to work either because of the hybrid PMUs.
> Add a per-PMU variable to track the callback users.
>
> In theory, the per-PMU variable should be checked everywhere the
> sched_task() can be called. But the X86 RDPMC is the only user for the
> per-PMU sched_cb_usage. A callback for the X86 RDPMC is required only
> when a different context is scheduled in. To avoid unnecessary
> sched_task() invoke, the per-PMU sched_cb_usage is only checked there.
> Should there be any other ARCHs which require it in the other places,
> it can be added later separately.
>
> Suggested-by: Rob Herring <robh@xxxxxxxxxx>
> Signed-off-by: Kan Liang <kan.liang@xxxxxxxxxxxxxxx>
> ---
>
> - New patch. Split the V6 to core and x86 parts.
>
> include/linux/perf_event.h | 3 +++
> kernel/events/core.c | 9 ++++++++-
> 2 files changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index c8a3388..c6ee202 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -301,6 +301,9 @@ struct pmu {
> /* number of address filters this PMU can do */
> unsigned int nr_addr_filters;
>
> + /* Track the per PMU sched_task() callback users */
> + atomic_t sched_cb_usage;
> +
> /*
> * Fully disable/enable this PMU, can be used to protect from the PMI
> * as well as for lazy/batch writing of the MSRs.
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 1574b70..286b718 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -3851,7 +3851,14 @@ static void perf_event_context_sched_in(struct perf_event_context *ctx,
> cpu_ctx_sched_out(cpuctx, EVENT_FLEXIBLE);
> perf_event_sched_in(cpuctx, ctx, task);
>
> - if (cpuctx->sched_cb_usage && pmu->sched_task)
> + /*
> + * X86 RDPMC is the only user for the per-PMU sched_cb_usage.

I think we can do without this line; since we know ARM64 also
potentially wants this.

> + * A callback for the X86 RDPMC is required only when a

Also, I think we spell it: x86.

> + * different context is scheduled in.
> + * To avoid unnecessary sched_task() invoke, the per-PMU
> + * sched_cb_usage is only checked here.
> + */
> + if (pmu->sched_task && (cpuctx->sched_cb_usage || atomic_read(&pmu->sched_cb_usage)))
> pmu->sched_task(cpuctx->task_ctx, true);
>
> perf_pmu_enable(pmu);

I'll sit on these patches a wee bit until Rob has provided feedback, but
I'm thinking this should do.