Re: [PATCH 0/4] perf: Fix the ctx->pmu for a hybrid system

From: Liang, Kan
Date: Thu Jun 17 2021 - 10:10:43 EST




On 6/17/2021 7:33 AM, Peter Zijlstra wrote:
On Thu, Jun 17, 2021 at 12:23:06PM +0200, Peter Zijlstra wrote:
On Wed, Jun 16, 2021 at 11:55:30AM -0700, kan.liang@xxxxxxxxxxxxxxx wrote:

To fix the issue, the generic perf codes have to understand the
supported CPU mask of a specific hybrid PMU. So it can update the
ctx->pmu accordingly, when a task is scheduled on a CPU which has
a different type of PMU from the previous CPU. The supported_cpus
has to be moved to the struct pmu.

Urghh.. I so hate this :-/

I *did* point you to:

https://lore.kernel.org/lkml/20181010104559.GO5728@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/

when you started this whole hybrid crud

Yes, to work around the hybrid, I updated the PMU for the CPU context accordingly, but not the task context. :( This issue is found in a stress test that was not ready at that time. Sorry for that.

, and I think that's still the
correct thing to do.
>> Still, let me consider if there's a workable short-term cludge I hate
less.

How's this? We already have x86_pmu_update_cpu_context() setting the
'correct' pmu in the cpuctx, so we can simply fold that back into the
task context.

For normal use this is a no-op.

Now I need to go audit all ctx->pmu usage :-(

---
diff --git a/kernel/events/core.c b/kernel/events/core.c
index db4604c4c502..6a496c29ef00 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3822,9 +3822,16 @@ static void perf_event_context_sched_in(struct perf_event_context *ctx,
struct task_struct *task)
{
struct perf_cpu_context *cpuctx;
- struct pmu *pmu = ctx->pmu;
+ struct pmu *pmu;
cpuctx = __get_cpu_context(ctx);
+
+ /*
+ * HACK; for HETEROGENOUS the task context might have switched to a
+ * different PMU, don't bother gating this.
+ */
+ pmu = ctx->pmu = cpuctx->ctx.pmu;
+

I think all the perf_sw_context PMUs share the same pmu_cpu_context. so the cpuctx->ctx.pmu should be always the first registered perf_sw_context PMU which is perf_swevent. The ctx->pmu could be another software PMU.

In theory, the perf_sw_context PMUs should have a similar issue. If the events are from different perf_sw_context PMUs, we should perf_pmu_disable() all of the PMUs before schedule them, but the ctx->pmu only tracks the first one.

I don't have a good way to fix the perf_sw_context PMUs. I think we have to go through the event list and find all PMUs. But I don't think it's worth doing.

Maybe we should only apply the change for the hybrid PMUs, and leave other PMUs as is.

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 6fee4a7..df9cce6 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3821,9 +3821,19 @@ static void perf_event_context_sched_in(struct perf_event_context *ctx,
struct task_struct *task)
{
struct perf_cpu_context *cpuctx;
- struct pmu *pmu = ctx->pmu;
+ struct pmu *pmu;

cpuctx = __get_cpu_context(ctx);
+
+ if (ctx->pmu->capabilities & PERF_PMU_CAP_HETEROGENEOUS_CPUS) {
+ /*
+ * HACK; for HETEROGENOUS the task context might have switched to a
+ * different PMU, don't bother gating this.
+ */
+ pmu = ctx->pmu = cpuctx->ctx.pmu;
+ } else
+ pmu = ctx->pmu;
+
if (cpuctx->task_ctx == ctx) {
if (cpuctx->sched_cb_usage)
__perf_pmu_sched_task(cpuctx, true);



Thanks,
Kan