Re: [PATCH v3] perf/core: Set event shadow time for inactive events too

From: Peter Zijlstra
Date: Fri Dec 10 2021 - 05:20:17 EST


On Thu, Dec 09, 2021 at 01:51:42PM -0800, Namhyung Kim wrote:
> On Thu, Dec 9, 2021 at 3:35 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> > On Thu, Dec 09, 2021 at 12:26:32PM +0100, Peter Zijlstra wrote:
> > > On Sun, Dec 05, 2021 at 02:48:43PM -0800, Namhyung Kim wrote:
> > >
> > > > Actually 18446744069443110306 is 0xffffffff01b345a2 so it seems to
> > > > have a negative enabled time. In fact, bperf keeps values returned by
> > > > bpf_perf_event_read_value() which calls perf_event_read_local(), and
> > > > accumulates delta between two calls. When event->shadow_ctx_time is
> > > > not set, it'd return invalid enabled time which is bigger than normal.
> > >
> > > *that*, how does it happen that shadow_time isn't set? It should be last
> > > set when the event switches to INACTIVE, no? At which point the logic in
> > > perf_event_read_local() should make @enabled move forward while @running
> > > stays put.
> > >
> > > Let me go rummage around a bit... either I'm missing something obvious
> > > or something's smelly.
> >
> > How's this then?
>
> Still the same :(

You're doing that bpf-cgroup crud, right? Where exactly do you hook into
to do the counter reads?

> Maybe because the event is enabled from the beginning.
> Then it might miss set_state/update_time at all.

Even then, it's set to INACTIVE and any state change thereafter needs to
go through perf_event_set_state() and update the relevant timestamps.

> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index 28aaeacdaea1..20637b7f420c 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -640,6 +640,9 @@ __perf_update_times(struct perf_event *event, u64 now, u64 *enabled, u64 *runnin
> > *running += delta;
> > }
> >
> > +static void perf_set_shadow_time(struct perf_event *event,
> > + struct perf_event_context *ctx);
> > +
> > static void perf_event_update_time(struct perf_event *event)
> > {
> > u64 now = perf_event_time(event);
> > @@ -647,6 +650,7 @@ static void perf_event_update_time(struct perf_event *event)
> > __perf_update_times(event, now, &event->total_time_enabled,
> > &event->total_time_running);
> > event->tstamp = now;
> > + perf_set_shadow_time(event, event->ctx);
>
> I like this.

Right, it keeps the shadow timestamp thingy in sync. Specifically it was
missing an update on event sched_out. Although thinking about it more,
that shouldn't make a difference since the relative displacement of the
clocks doesn't change at that point. All that changes there is that
RUNNING should stop advancing.

So in that regards, this not actually changing anything makes sense.

> > @@ -3748,15 +3727,14 @@ static int merge_sched_in(struct perf_event *event, void *data)
> > }
> >
> > if (event->state == PERF_EVENT_STATE_INACTIVE) {
> > - *can_add_hw = 0;
> > if (event->attr.pinned) {
> > perf_cgroup_event_disable(event, ctx);
> > perf_event_set_state(event, PERF_EVENT_STATE_ERROR);
> > - } else {
> > - ctx->rotate_necessary = 1;
> > - perf_mux_hrtimer_restart(cpuctx);
> > - group_update_userpage(event);
> > }
> > +
> > + *can_add_hw = 0;
> > + ctx->rotate_necessary = 1;
> > + perf_mux_hrtimer_restart(cpuctx);
>
> Not sure about this. We might not want to rotate them
> if a pinned event failed...?

It's just a straight revert, but you're right, this stuff needs
some improvement.