Re: [RFC v1] Tunable sched_mc_power_savings=n

From: Dipankar Sarma
Date: Thu Jun 26 2008 - 17:03:39 EST


On Thu, Jun 26, 2008 at 10:17:00PM +0200, Andi Kleen wrote:
> Vaidyanathan Srinivasan wrote:
> > System management software and workload monitoring and managing
> > software can potentially control the tunable on behalf of the
> > applications for best overall power savings and performance.
>
> Does it have the needed information for that? e.g. real time information
> on what the system does? I don't think anybody is in a better position
> to control that than the kernel.

Some workload managers already do that - they provision cpu and memory
resources based on request rates and response times. Such software is
in a better position to make a decision whether they can live with
reduced performance due to power saving mode or not. The point I am
making is the the kernel doesn't have any notion of transactional
performance - so if an administrator wants to run unimportant
transactions on a slower but low-power system, he/she should have
the option of doing so.

> > Applications with conflicting goals should resolve among themselves.
>
> That sounds wrong to me. Negotiating between conflicting requirements
> from different applications is something that kernels are supposed
> to do.

Agreed. However that is a difficult problem to solve and not the
intention of this idea. Global power setting is a simple first step.
I don't think we have a good understanding of cases where conflicting
power requirements from multiple applications need to be addressed.
We will have to look at that when the issue arises.

> > In a small-scale datacenters, peak and off-peak hour settings can be
> > potentially done through simple cron jobs.
>
> Is there any real drawback from only controlling it through nice levels?

In a system with more than a couple of sockets, it is more beneficial
(power-wise) to pack all work in to a small number of processors
and let the other processors go to very low power sleep. Compared
to running tasks slowly and spreading them all over the processors.

> Anyways I think the main thing I object to in your proposal is that
> your tunable is system global, not per process. I'm also not
> sure if a tunable is really a good idea and if the kernel couldn't
> do a better job.

While it would be nice to have a per process tunable, I am not sure
we are ready for that yet. A global setting is easy to implement
and we have immediate use for it. The kernel already does a decent
job conservatively - by packing one task per core in a package
when sched_mc_power_savings=1 is set. Any further packing may affect
performance and should not therefore be the default behavior.

Thanks
Dipankar
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/