Re: [PATCH V4 07/14] perf/x86/intel: Support hardware TopDown metrics

From: Peter Zijlstra
Date: Mon Sep 30 2019 - 10:53:34 EST


On Mon, Sep 30, 2019 at 04:07:55PM +0200, Peter Zijlstra wrote:
> On Mon, Sep 30, 2019 at 03:06:15PM +0200, Peter Zijlstra wrote:
> > On Mon, Sep 16, 2019 at 06:41:21AM -0700, kan.liang@xxxxxxxxxxxxxxx wrote:
>
> > > +static bool is_first_topdown_event_in_group(struct perf_event *event)
> > > +{
> > > + struct perf_event *first = NULL;
> > > +
> > > + if (is_topdown_event(event->group_leader))
> > > + first = event->group_leader;
> > > + else {
> > > + for_each_sibling_event(first, event->group_leader)
> > > + if (is_topdown_event(first))
> > > + break;
> > > + }
> > > +
> > > + if (event == first)
> > > + return true;
> > > +
> > > + return false;
> > > +}
> >
> > > +static u64 icl_update_topdown_event(struct perf_event *event)
> > > +{
> > > + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
> > > + struct perf_event *other;
> > > + u64 slots, metrics;
> > > + int idx;
> > > +
> > > + /*
> > > + * Only need to update all events for the first
> > > + * slots/metrics event in a group
> > > + */
> > > + if (event && !is_first_topdown_event_in_group(event))
> > > + return 0;
> >
> > This is pretty crap and approaches O(n^2); let me think if there's
> > anything saner to do here.
>
> This is also really complicated in the case where we do
> perf_remove_from_context() in the 'wrong' order.
>
> In that case we get detached events that are not up-to-date (and never
> will be). It doesn't look like that matters, but it is weird.

So we either get called from the PMI, or read(). In the PMI there is the
perf_output_read_group() path, and that too appears broken vs the above,
it assumes perf_event_count() is up-to-date after calling pmu->read(),
which isn't true.

Now, I'm thinking that is already broken vs TXN_READ, so we should fix
that a little something like the below (needs to be tested on
Power-hv-24x7).

---
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6272,10 +6272,22 @@ static void perf_output_read_group(struc
if (read_format & PERF_FORMAT_TOTAL_TIME_RUNNING)
values[n++] = running;

+ if (leader->nr_siblings > 1)
+ leader->pmu->start_txn(pmu, PERF_PMU_TXN_READ);
+
if ((leader != event) &&
(leader->state == PERF_EVENT_STATE_ACTIVE))
leader->pmu->read(leader);

+ for_each_sibling_event(sub, leader) {
+ if ((sub != event) &&
+ (sub->state == PERF_EVENT_STATE_ACTIVE))
+ sub->pmu->read(sub);
+ }
+
+ if (leader->nr_siblings > 1)
+ leader->pmu->commit_tx(pmu, PERF_PMU_TXN_READ);
+
values[n++] = perf_event_count(leader);
if (read_format & PERF_FORMAT_ID)
values[n++] = primary_event_id(leader);
@@ -6285,10 +6297,6 @@ static void perf_output_read_group(struc
for_each_sibling_event(sub, leader) {
n = 0;

- if ((sub != event) &&
- (sub->state == PERF_EVENT_STATE_ACTIVE))
- sub->pmu->read(sub);
-
values[n++] = perf_event_count(sub);
if (read_format & PERF_FORMAT_ID)
values[n++] = primary_event_id(sub);


After that, I think we can simply do something like:

icl_update_topdown_event(..)
{
int idx = event->hwc.idx;

if (is_metric_idx(idx))
return;

// must be FIXED_SLOTS

/* do teh thing and update SLOTS and METRIC together */
}

Hmmm?