Re: [PATCH] perf_events: improve x86 event scheduling (v6 incremental)

From: Peter Zijlstra
Date: Mon Jan 25 2010 - 13:00:01 EST


On Mon, 2010-01-25 at 18:48 +0100, stephane eranian wrote:

> >> It seems a solution would be to call x86_pmu_disable() before
> >> assigning an event to a new counter for all events which are
> >> moving. This is because we cannot assume all events have been
> >> previously disabled individually. Something like
> >>
> >> if (!match_prev_assignment(hwc, cpuc, i)) {
> >> if (hwc->idx != -1)
> >> x86_pmu.disable(hwc, hwc->idx);
> >> x86_assign_hw_event(event, cpuc, cpuc->assign[i]);
> >> x86_perf_event_set_period(event, hwc, hwc->idx);
> >> }
> >
> > Yes and no, my worry is not that its not counting, but that we didn't
> > store the actual counter value before over-writing it with the new
> > configuration.
> >
> > As to your suggestion,
> > 1) we would have to do x86_pmu_disable() since that does
> > x86_perf_event_update().
> > 2) I worried about the case where we basically switch two counters,
> > there we cannot do the x86_perf_event_update() in a single pass since
> > programming the first counter will destroy the value of the second.
> >
> > Now possibly the scenario in 2 isn't possible because the event
> > scheduling is stable enough for this never to happen, but I wasn't
> > feeling too sure about that, so I skipped this part for now.
> >
> I think what adds to the complexity here is that there are two distinct
> disable() mechanisms: perf_disable() and x86_pmu.disable(). They
> don't operate the same way. You would think that by calling hw_perf_disable()
> you would stop individual events as well (thus saving their values). That
> means that if you do perf_disable() and then read the count, you will not
> get the up-to-date value in event->count. you need pmu->disable(event)
> to ensure that.

No, a read is actually good enough, it does x86_perf_event_update(), but
we're not guaranteeing that read is present.

So yes, perf_disable() is basically a local_irq_disable() but for perf
events, all it has to do is ensure there's no concurrency. ->disable()
will really tear the configuration down.

Either ->disable() or ->read() will end up calling
x86_perf_event_update() which is needed to read the actual hw counter
value and propagate it into event storage.

> So my understanding is that perf_disable() is meant for a temporary stop,
> thus no need to save the count.
>
> As for 2, I believe this can happen if you add 2 new events which have more
> restrictions. For instance on Core, you were measuring cycles, inst in generic
> counters, then you add 2 events which can only be measured on generic counters.
> That will cause cycles, inst to be moved to fixed counters.
>
> So we have to modify hw_perf_enable() to first disable all events
> which are moving,
> then reprogram them. I suspect it may be possible to optimize this if
> we detect that
> those events had already been stopped individually (as opposed to
> perf_disable()), i.e.,
> already had their counts saved.

Right, I see no fundamentally impossible things at all, we just need to
be careful here.

Anyway, I poked at the stack I've got now and it seems to hold up when I
poke at it with various combinations of constraint events, so I'll push
that off to Ingo and then we can go from there.

Thanks for working on this!

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/