Re: [RFC][PATCH v5 00/14] sched: packing tasks

From: Catalin Marinas
Date: Mon Nov 11 2013 - 13:33:31 EST


On Mon, Nov 11, 2013 at 04:54:54PM +0000, Morten Rasmussen wrote:
> On Mon, Nov 11, 2013 at 11:33:45AM +0000, Catalin Marinas wrote:
> > I would rather start by defining the main goal and working backwards
> > to an algorithm. We may as well find that task packing based on this
> > patch set is sufficient but we may also get packing-like behaviour as
> > a side effect of a broader approach (better energy cost awareness). An
> > important aspect even in the mobile space is keeping the performance
> > as close as possible to the standard scheduler while saving a bit more
>
> With the exception of big.LITTLE where we want to out-perform the
> standard scheduler while saving power.

Good point. Maybe we should start with a separate set of patches for
improving the performance on asymmetric configurations like big.LITTLE
while ignoring (deferring) the power aspect. Things like placing bigger
threads on bigger CPUs and so on (you know better what's needed here ;).

> > My understanding from the recent discussions is that the scheduler
> > should decide directly on the C-state (or rather the deepest C-state
> > possible since we don't want to duplicate the backend logic for
> > synchronising CPUs going up or down). This means that the scheduler
> > needs to know about C-state target residency, wake-up latency (I think
> > we can leave coupled C-states to the backend, there is some complex
> > synchronisation which I wouldn't duplicate).
>
> It would be nice and simple to hide the complexity of the coupled
> C-states, but we would loose the ability to prefer waking up cpus in a
> cluster/package that already has non-idle cpus over cpus in a
> cluster/package that has entered the coupled C-state. If we just know
> the requested C-state of a cpu we can't tell the difference as it is
> now.

I agree, we can't rely on the requested C-state but the _actual_ state
and this means querying the hardware driver. Can we abstract this via
some interface which provides the cost of waking up a CPU? This could
take into account the state of the other CPUs in the cluster and the
scheduler is simply concerned with the wake-up costs.

> > Alternatively (my preferred approach), we get the scheduler to predict
> > and pass the expected residency and latency requirements down to a
> > power driver and read back the actual C-states for making task
> > placement decisions. Some of the menu governor prediction logic could
> > be turned into a library and used by the scheduler. Basically what
> > this tries to achieve is better scheduler awareness of the current
> > C-states decided by a cpuidle/power driver based on the scheduler
> > constraints.
>
> It might be easier to deal with the couple C-states using this approach.

We already have drivers taking care of the couple C-states, so it means
passing the information back to the scheduler in some way (actual
C-state or wake-up cost).

It would be nice if we can describe the wake-up costs statically while
considering coupled C-states but it needs more thinking.

--
Catalin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/