Re: [PATCH] perf_events: fix and improve x86 event scheduling

From: Stephane Eranian
Date: Mon Nov 07 2011 - 08:52:52 EST

Next message: Borislav Petkov: "Re: [PATCH 1/4] lockdep: lock_set_subclass() fix"
Previous message: Paul Bolle: "[PATCH] [TRIVIAL] 8250_hp300: Fix warning typo 'CONFIG_8250'"
In reply to: Peter Zijlstra: "Re: [PATCH] perf_events: fix and improve x86 event scheduling"
Next in thread: Peter Zijlstra: "Re: [PATCH] perf_events: fix and improve x86 event scheduling"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Mon, Nov 7, 2011 at 12:10 PM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Mon, 2011-11-07 at 12:01 +0100, Stephane Eranian wrote:
>> + Â Â Â Â Â Â Â Â Â Â Â /*
>> + Â Â Â Â Â Â Â Â Â Â Â Â* scan all possible counters for this event
>> + Â Â Â Â Â Â Â Â Â Â Â Â* but use the one with the smallest counter weight,
>> + Â Â Â Â Â Â Â Â Â Â Â Â* i.e., give a chance to other less constrained events
>> + Â Â Â Â Â Â Â Â Â Â Â Â*/
>> Â Â Â Â Â Â Â Â Â Â Â Â for_each_set_bit(j, c->idxmsk, X86_PMC_IDX_MAX) {
>>
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â if (test_bit(j, used_mask))
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â continue;
>> +
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â if (wcnt[j] < min_wcnt) {
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â min_wcnt = wcnt[j];
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â wcnt_idx = j;
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â }
>> + Â Â Â Â Â Â Â Â Â Â Â }
>
> The problem with this is that it will typically hit the worst case for
> Intel fixed-purpose events since the fixed purpose counters have the
> highest counter index and their constraint masks are the heaviest in the
> system ensuring we hit the max loop count on the top loop.
>
Yes, but to avoid this you'd need some form of sorting.
We could also cut down on top loop iterations by caching the weight list
of the events and examining only the weights we do have.

As for the complexity, I think it is mostly bound to the number of
counters rather
than event.

The top loop is about event weights, that's bound by the number of counters.
The middle loop is about the number of events.
The inner loop is about the number of counters.

So I'd expect the complexity to be O(c^2*n)
where c is the number of counters, n the number of events.
But given we limit the number of events to that of counters,
we do have O(c^3).

As for the map_idx, it's there to track the position of each event in the
initial event list. We shuffle events between constrained and unconstrained.
By stashing the map_idx in the hw_perf_event struct we avoid having to
pass around yet another array.

> Then again, with Robert's approach we have to mark all fixed purpose
> thingies as redo and we might hit some weird cases there as well, can't
> seem to get me brain straight on that case though.
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Borislav Petkov: "Re: [PATCH 1/4] lockdep: lock_set_subclass() fix"
Previous message: Paul Bolle: "[PATCH] [TRIVIAL] 8250_hp300: Fix warning typo 'CONFIG_8250'"
In reply to: Peter Zijlstra: "Re: [PATCH] perf_events: fix and improve x86 event scheduling"
Next in thread: Peter Zijlstra: "Re: [PATCH] perf_events: fix and improve x86 event scheduling"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]