Re: round-robining per-cpu counters
From: Ingo Molnar
Date:  Tue May 05 2009 - 02:41:36 EST
* Paul Mackerras <paulus@xxxxxxxxx> wrote:
> It used to be, and as far as I can see still is, the case that 
> per-cpu counters take priority over per-task counters by virtue of 
> being scheduled in first.  That is, if you have N hardware 
> counters and >= N per-cpu counters, then no per-task counters will 
> ever get scheduled onto the PMU.
> 
> That being the case, I don't see what the point of having the 
> perf_reserved_percpu variable is.  It doesn't do anything except 
> set cpuctx->max_pertask, which isn't actually used anywhere.  In 
> any case with the current counter scheduling system there's no 
> need to "reserve" hardware counters for use by per-cpu counters 
> since any new per-cpu counters will just bump existing per-task 
> counters off - if not immediately then the next time that 
> perf_counter_task_tick gets called.
> 
> What was the intended meaning of perf_reserved_percpu?  I presume 
> it was that there would always be that many hardware counters 
> available for per-cpu counters regardless of how many per-task 
> counters there are.  But that doesn't answer the complementary 
> question - how many hardware counters can we rely on being 
> available for per-task counters?  At the moment the answer is 0, 
> but I don't think that is a good answer.
> 
> Does anyone have any good ideas about what the scheduling policy 
> should be?
The reservation mechanism really suffered from not being used by 
anything or anyone, and it thus bit-rotted across 300 follow-on 
commits.
What would be the primary usecase? Allow admin to set aside (and 
guarantee) space for task counters? Allow admin to 'force' 
exclusivity of counter ownership?
I think a better general solution would be to have a single 
round-robin list for all currently active counters (both percpu and 
task counters) - and fairly round-robin all of them. The scaling 
information makes it obvious when this is happening.
If admin wants stronger ownership of counters then the 
pinned/exclusive attribute can be used.
We really want to keep the counter-scheduler simple, and we also 
want to make the default to be as permissive as possible.
	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/