[discussion]sched: a rough proposal to enable power saving in scheduler

From: Alex Shi
Date: Mon Aug 13 2012 - 08:21:02 EST


Since there is no power saving consideration in scheduler CFS, I has a
very rough idea for enabling a new power saving schema in CFS.

It bases on the following assumption:
1, If there are many task crowd in system, just let few domain cpus
running and let other cpus idle can not save power. Let all cpu take the
load, finish tasks early, and then get into idle. will save more power
and have better user experience.

2, schedule domain, schedule group perfect match the hardware, and
the power consumption unit. So, pull tasks out of a domain means
potentially this power consumption unit idle.

So, according Peter mentioned in commit 8e7fbcbc22c(sched: Remove stale
power aware scheduling), this proposal will adopt the
sched_balance_policy concept and use 2 kind of policy: performance, power.

And in scheduling, 2 place will care the policy, load_balance() and in
task fork/exec: select_task_rq_fair().

Here is some pseudo code try to explain the proposal behaviour in
load_balance() and select_task_rq_fair();


load_balance() {
update_sd_lb_stats(); //get busiest group, idlest group data.

if (sd->nr_running > sd's capacity) {
//power saving policy is not suitable for
//this scenario, it runs like performance policy
mv tasks from busiest cpu in busiest group to
idlest cpu in idlest group;
} else {// the sd has enough capacity to hold all tasks.
if (sg->nr_running > sg's capacity) {
//imbalanced between groups
if (schedule policy == performance) {
//when 2 busiest group at same busy
//degree, need to prefer the one has
// softest group??
move tasks from busiest group to
idletest group;
} else if (schedule policy == power)
move tasks from busiest group to
idlest group until busiest is just full
of capacity.
//the busiest group can balance
//internally after next time LB,
} else {
//all groups has enough capacity for its tasks.
if (schedule policy == performance)
//all tasks may has enough cpu
//resources to run,
//mv tasks from busiest to idlest group?
//no, at this time, it's better to keep
//the task on current cpu.
//so, it is maybe better to do balance
//in each of groups
for_each_imbalance_groups()
move tasks from busiest cpu to
idlest cpu in each of groups;
else if (schedule policy == power) {
if (no hard pin in idlest group)
mv tasks from idlest group to
busiest until busiest full.
else
mv unpin tasks to the biggest
hard pin group.
}
}
}
}

select_task_rq_fair()
{
for_each_domain(cpu, tmp) {
if (policy == power && tmp_has_capacity &&
tmp->flags & sd_flag) {
sd = tmp;
//It is fine to got cpu in the domain
break;
}
}

while(sd) {
if policy == power
find_busiest_and_capable_group()
else
find_idlest_group();
if (!group) {
sd = sd->child;
continue;
}
...
}
}

sub proposal:
1, If it's possible to balance task on idlest cpu not appointed 'balance
cpu'. If so, it may can reduce one more time balancing.
The idlest cpu can prefer the new idle cpu; and is the least load cpu;
2, se or task load is good for running time setting.
but it should the second basis in load balancing. The first basis of LB
is running tasks' number in group/cpu. Since whatever of the weight of
groups is, if the tasks number is less than cpu number, the group is
still has capacity to take more tasks. (will consider the SMT cpu power
or other big/little cpu capacity on ARM.)

unsolved issues:
1, like current scheduler, it didn't handled cpu affinity well in
load_balance.
2, task group that isn't consider well in this rough proposal.

It isn't consider well and may has mistaken . So just share my ideas and
hope it become better and workable in your comments and discussion.

Thanks
Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/