Re: [PATCH] perf/core: Fix endless multiplex timer

From: Liang, Kan
Date: Wed Mar 04 2020 - 09:20:46 EST




On 3/4/2020 4:33 AM, Peter Zijlstra wrote:
On Tue, Mar 03, 2020 at 08:40:10PM -0500, Liang, Kan wrote:
I'm thinking this is wrong.

That is, yes, this fixes the observed problem, but it also misses at
least one other site. Which seems to suggest we ought to take a
different approach.

But even with that; I wonder if the actual condition isn't wrong.
Suppose the event was exclusive, and other events weren't scheduled
because of that. Then you disable the one exclusive event _and_ kill
rotation, so then nothing else will ever get on.

So what I think was supposed to happen is rotation killing itself;
rotation will schedule out the context -- which will clear the flag, and
then schedule the thing back in -- which will set the flag again when
needed.

Now, that isn't happening, and I think I see why, because when we drop
to !nr_active, we terminate ctx_sched_out() before we get to clearing
the flag, oops!

So how about something like this?

---
diff --git a/kernel/events/core.c b/kernel/events/core.c
index e453589da97c..7947bd3271a9 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2182,6 +2182,7 @@ __perf_remove_from_context(struct perf_event *event,
if (!ctx->nr_events && ctx->is_active) {
ctx->is_active = 0;
+ ctx->rotate_necessary = 0;
if (ctx->task) {
WARN_ON_ONCE(cpuctx->task_ctx != ctx);
cpuctx->task_ctx = NULL;


The patch can fix the observed problem with uncore PMU.
But it cannot fix all the cases with core PMU, especially when NMI watchdog
is enabled.
Because the ctx->nr_events never be 0 with NMI watchdog enabled.

But, I'm confused.. why do we care about nr_events==0 ? The below: vvvv

@@ -3074,15 +3075,15 @@ static void ctx_sched_out(struct perf_event_context *ctx,
is_active ^= ctx->is_active; /* changed bits */
- if (!ctx->nr_active || !(is_active & EVENT_ALL))
- return;
-
/*
* If we had been multiplexing, no rotations are necessary, now no events
* are active.
*/
ctx->rotate_necessary = 0;
+ if (!ctx->nr_active || !(is_active & EVENT_ALL))
+ return;
+
perf_pmu_disable(ctx->pmu);
if (is_active & EVENT_PINNED) {
list_for_each_entry_safe(event, tmp, &ctx->pinned_active, active_list)

Makes sure we clear the flag when we ctx_sched_out(), and as long as
ctx->rotate_necessary is set, perf_rotate_context() will do exactly
that.


NMI watchdog is pinned event.
ctx_event_to_rotate() will only pick an event from the flexible_groups.
So the cpu_ctx_sched_out() in perf_rotate_context() will never be called.


Thanks,
Kan


Then ctx_sched_in() will re-set the flag if it failed to schedule a
counter.

So where is that going wrong?