Re: [5/11] issue 5: Frequency and uarch invariant task load

From: Peter Zijlstra
Date: Wed Jan 08 2014 - 07:31:48 EST


On Tue, Jan 07, 2014 at 04:19:41PM +0000, Morten Rasmussen wrote:
> Potential solution: Frequency invariance has been proposed before [1]
> where the task load is scaled by the cur/max freq ratio. Another
> possibility is to use hardware counters if such are available on the
> platform.
>
> [1] https://lkml.org/lkml/2013/4/16/289

Right, I just had a look at those patches.. they're not horrible but I
think they're missing a few opportunities.

My main objection to them is that I think the newly introduced
max_capacity is exactly what the current cpu_power thing is -- then
again, I still haven't let the entire thing sink in well enough.

Not to mention we need to fix some of the cpu_power abuse -- like the
correlation to capacity, which as stated in previous emails should be
sorted using utilization.

So DVFS certainly makes sense, and would indeed be required in order to
make sensible decisions in the face of P states. Even in the face of
funny hardware like Intel which pretty much ignores whatever you tell it
and does it own merry thing.


A few random thoughts:

- I think for SMP-nice we want to migrate from /max_capacity to
/curr_capacity; because SMP-nice cares about 100% utilization
regardless of the actual P state. If we're somehow forced into a
lower P state (thermal or otherwise) fairness is best served by
normalizing at the rate we're actually running at, not the potential
maximal.

- We need to re-think SMT and turbo-bins in general; I think we can
think of those two as the same effective thing. This does mean Intel
chips will have a dual layer of this goo, and we can currently barely
deal with the 1 SMT layer, let alone do something sensible with 2.

To clarify, a single SMT thread will generally go 'faster' on its own
since it doesn't need to compete with the other thread(s) for core
resources, but together they might better utilize the core resources
giving an over-all throughput win.

Similar for turbo bins, a single core can go faster on its own since
it doesn't have competition for energy and thermal constraints, but
together cores can probably achieve greater throughput.

So we need a better way to describe this capacity dependency and
variability.

I'm fairly sure ARM doesn't do SMT, but they certainly suffer from
thermal caps and can thus have effective turbo bins, even though
they're not explicit and magic like with Intel.

And of course the honorary mention goes to Power7 which has
asymmetric bins -- lets hope they fix it and nobody else things them
a great idea.

- For hardware without P state controls, or hardware that pretty much
ignores them, we need means of obtaining the max and curr capacity.

Intel has the APERF, MPERF registers which resp. count at actual
frequency and fixed frequency. Using them is a bit tricky since
APERF doesn't count when idle, but when filtering out the idle time
they do provide a current performance ratio.

From that we could obtain a max performance ratio by using a wide
window max on the current value or somesuch.

Again, SMT and turbo-bins will complicate matters..

Other CPUs that have magic P state control might not provide such
registers which would require PMU resources, which would completely
blow :/


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/