Re: [PATCH] perf_events: improve x86 event scheduling (v5)

From: Peter Zijlstra
Date: Tue Jan 19 2010 - 07:23:21 EST


On Mon, 2010-01-18 at 18:29 +0100, Frederic Weisbecker wrote:

> It has constraints that only need to be checked when we register
> the event. It has also constraint on enable time but nothing
> tricky that requires an overwritten group scheduling.

The fact that ->enable() can fail makes it a hardware counter. Software
counters cannot fail enable.

Having multiple groups of failable events (multiple hardware pmus) can
go wrong with the current core in interesting ways, look for example at
__perf_event_sched_in():

It does:

int can_add_hw = 1;

...

list_for_each_entry(event, &ctx->flexible_groups, group_entry) {
/* Ignore events in OFF or ERROR state */
if (event->state <= PERF_EVENT_STATE_OFF)
continue;
/*
* Listen to the 'cpu' scheduling filter constraint
* of events:
*/
if (event->cpu != -1 && event->cpu != cpu)
continue;

if (group_can_go_on(event, cpuctx, can_add_hw))
if (group_sched_in(event, cpuctx, ctx, cpu))
can_add_hw = 0;
}

Now, if you look at that logic you'll see that it assumes there's one hw
device since it only has one can_add_hw state. So if your hw_breakpoint
pmu starts to fail we'll also stop adding counters to the cpu pmu (for
lack of a better name) and vs.

This might be fixable by using per-cpu struct pmu variables.

I'm going to try and move all the weak hw_perf_* functions into struct
pmu and create a notifier like callchain for them so we can have proper
per pmu state, and then use that to fix these things up.

However I'm afraid its far to late to push any of that into .33, which
means .33 will have rather funny behaviour once the breakpoints start
getting used.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/