Re: [RFC][PATCH 0/9] sched: Power scheduler design proposal

From: Catalin Marinas
Date: Wed Jul 17 2013 - 10:16:18 EST


On Tue, Jul 16, 2013 at 04:23:08PM +0100, Arjan van de Ven wrote:
> On 7/16/2013 5:42 AM, Catalin Marinas wrote:
> > Morten's power scheduler tries to address the above and it will grow
> > into controlling a new model of power driver (and taking into account
> > Arjan's and others' comments regarding the API). At the same time, we
> > need some form of task packing. The power scheduler can drive this
> > (currently via cpu_power) or can simply turn a knob if there are better
> > options that will be accepted in the scheduler.
>
> how much would you be helped if there was a simple switch
>
> sort left versus sort right
>
> (assuming the big cores are all either low or high numbers)

It helps a bit compared to the current behaviour but there is a lot of
room for improvement.

> the sorting is mostly statistical, but that's good enough in practice..
> each time a task wakes up, you get a bias towards either low or high
> numbered idle cpus

If cores within a cluster (socket) are not power-gated individually
(implementation dependent), it makes more sense to spread the tasks
among the cores to either get a lower frequency or just get to idle
quicker. For little cores, even when they are individually power-gated,
they don't consume much so we would rather spread the tasks equally.

> very quickly all tasks will be on one side, unless your system is so
> loaded that all cpus are full.

It should be more like left socket vs both sockets with the possibility
of different balancing within a socket. But then we get back to the
sched_smt/sched_mc power aware scheduling that was removed from the
kernel.

It's also important when to make this decision to sort left vs right and
we want to avoid migrating threads unnecessarily. There could be small
threads (e.g. an mp3 decoding thread) that should stay on the little
core.

Power aware scheduling should not affect the performance (call them
benchmarks) but the scheduler could take power implications into
account. The hard part is formalising this with differences between
architectures and SoCs. Maybe a low-level driver or arch hook like "get
me the most power efficient CPU that can run a task" but it's not clear
how this would work (we can't easily predict what the future load will
be).

Our proposal is to split the balancing into two problems: equal
balancing vs. CPU capacity (the latter can be improved to address arch
concerns). These two problems can be later unified once we have a better
understanding of its implications across architectures.

For big.LITTLE we could work around the scheduler (in a very hacky way)
with a combination of pstate/powerclamp driver which forces idle on the
big cores when not needed but I would rather get the scheduler to make
such decisions.

--
Catalin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/