sched: deep power-saving states

From: Gregory Haskins
Date: Wed Oct 22 2008 - 09:40:29 EST


[Resending from my real account... .ml@gmail is for mailing list traffic
and I forgot to change the "from" field :P)

Hi Arjan,
I was giving some thought to that topic you brought up at our
LF-end-user session on RT w.r.t. deep power state wakeup adding latency.

As Steven mentioned, we currently have this thing called "cpupri"
(kernel/sched_cpupri.c) in the scheduler which allows us to classify
each core (on a per disjoint cpuset basis) as being either IDLE,
SCHED_OTHER, or RT1 - RT99. (Note that currently we lump both IDLE and
SCHED_OTHER together as SCHED_OTHER because we don't yet care to
differentiate between them, but I have patches to fix this that I can
submit).

What I was thinking is that a simple mechanism to quantify the
power-state penalty would be to add those states as priority levels in
the cpupri namespace. E.g. We could substitute IDLE-RUNNING for IDLE,
and add IDLE-PS1, IDLE-PS2, .. IDLE-PSn, OTHER, RT1, .. RT99. This
means the scheduler would favor waking an IDLE-RUNNING core over an
IDLE-PS1-PSn, etc. The question in my mind is: can the power-states be
determined in a static fashion such that we know what value to quantify
the idle state before we enter it? Or is it more dynamic (e.g. the
longer it is in an MWAIT, the deeper the sleep gets).

If its dynamic, is there a deterministic algorithm that could be applied
so that, say, a timer on a different CPU (bsp makes sense to me) could
advance the IDLE-PSx state in cpupri on behalf of the low-power core as
time goes on?

Thoughts?
-Greg


Attachment: signature.asc
Description: OpenPGP digital signature