Re: [RFC] (How to) Let idle CPUs sleep

From: Nick Piggin
Date: Fri May 13 2005 - 03:29:06 EST


Ingo Molnar wrote:

the power equation is really easy: the implicit cost of a deep CPU sleep is say 1-2 msecs. (that's how long it takes to shut the CPU and the bus down, etc.) If we do an exponential backoff we periodically re-wake the CPU fully up again - wasting 1-2msec (or more) more power. With the watchdog solution we have more overhead on the busy CPU but it takes _much_ less power for a truly idle CPU to be turned off. [the true 'effective cost' all depends on the scheduling pattern as well, but the calculation before is still valid.] Whatever the algorithmic overhead of the watchdog code, it's dwarved by the power overhead caused by false idle-wakeups of CPUs under exponential backoff.


Well, it really depends on how it is implemented, and what tradeoffs
you make.

Let's say that you don't start deep sleeping until you've backed off
to 64ms rebalancing. Now the CPU power consumption is reduced to less
than 2% of ideal.

Now we don't have to worry about uniprocessor, and SMP systems that
go *completely* idle can have mechanism to indefinitely deep sleep
all CPUs until there is real work.

What you're left with are SMP systems with *some* activity happening,
and of those, I bet most idle CPUs will have other reasons to be woken
up other than the scheduler tick anyway.

And don't forget that the watchdog approach can just as easily deep
sleep a CPU only to immediately wake it up again if it detects an
imbalance.

So in terms of real, system wide power savings, I'm guessing the
difference would really be immesurable.

And the CPU usage / wakeup cost arguments cut both ways. The busy
CPUs have to do extra work in the watchdog case.

the watchdog solution - despite being more complex - is also more orthogonal in that it does not change the balancing decisions at all - they just get offloaded to another CPU. The exponential backoff OTOH materially changes how we do SMP balancing - which might or might not matter much, but it will always depend on circumstances. So in the long run the watchdog solution is probably easier to control. (because it's just an algorithm offload, not a material scheduling feature.)


Well so does the watchdog, really. But it's probably not like you have
to *really* tune sleep algorithms _exactly_ right, yeah? So long as you
get within even 5% of total theoretical power saving on SMP systems,
it's likely good enough.

so unless there are strong implementational arguments against the watchdog solution, i definitely think it's the higher quality solution, both in terms of power savings, and in terms of impact.


I'm think power savings will be unmeasurable between the two approaches,
backoff will be quite a lot less complex, and have less impact on CPUs
that are busy doing real work.

Smells like a bakeoff coming up :)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/