Re: [discussion]sched: a rough proposal to enable power saving inscheduler
From: Alex Shi
Date: Tue Aug 14 2012 - 03:35:15 EST
On 08/13/2012 08:21 PM, Alex Shi wrote:
> Since there is no power saving consideration in scheduler CFS, I has a
> very rough idea for enabling a new power saving schema in CFS.
>
> It bases on the following assumption:
> 1, If there are many task crowd in system, just let few domain cpus
> running and let other cpus idle can not save power. Let all cpu take the
> load, finish tasks early, and then get into idle. will save more power
> and have better user experience.
>
> 2, schedule domain, schedule group perfect match the hardware, and
> the power consumption unit. So, pull tasks out of a domain means
> potentially this power consumption unit idle.
>
> So, according Peter mentioned in commit 8e7fbcbc22c(sched: Remove stale
> power aware scheduling), this proposal will adopt the
> sched_balance_policy concept and use 2 kind of policy: performance, power.
>
> And in scheduling, 2 place will care the policy, load_balance() and in
> task fork/exec: select_task_rq_fair().
Any comments for this rough proposal, specially for the assumptions?
>
> Here is some pseudo code try to explain the proposal behaviour in
> load_balance() and select_task_rq_fair();
>
>
> load_balance() {
> update_sd_lb_stats(); //get busiest group, idlest group data.
>
> if (sd->nr_running > sd's capacity) {
> //power saving policy is not suitable for
> //this scenario, it runs like performance policy
> mv tasks from busiest cpu in busiest group to
> idlest cpu in idlest group;
> } else {// the sd has enough capacity to hold all tasks.
> if (sg->nr_running > sg's capacity) {
> //imbalanced between groups
> if (schedule policy == performance) {
> //when 2 busiest group at same busy
> //degree, need to prefer the one has
> // softest group??
> move tasks from busiest group to
> idletest group;
> } else if (schedule policy == power)
> move tasks from busiest group to
> idlest group until busiest is just full
> of capacity.
> //the busiest group can balance
> //internally after next time LB,
> } else {
> //all groups has enough capacity for its tasks.
> if (schedule policy == performance)
> //all tasks may has enough cpu
> //resources to run,
> //mv tasks from busiest to idlest group?
> //no, at this time, it's better to keep
> //the task on current cpu.
> //so, it is maybe better to do balance
> //in each of groups
> for_each_imbalance_groups()
> move tasks from busiest cpu to
> idlest cpu in each of groups;
> else if (schedule policy == power) {
> if (no hard pin in idlest group)
> mv tasks from idlest group to
> busiest until busiest full.
> else
> mv unpin tasks to the biggest
> hard pin group.
> }
> }
> }
> }
>
> select_task_rq_fair()
> {
> for_each_domain(cpu, tmp) {
> if (policy == power && tmp_has_capacity &&
> tmp->flags & sd_flag) {
> sd = tmp;
> //It is fine to got cpu in the domain
> break;
> }
> }
>
> while(sd) {
> if policy == power
> find_busiest_and_capable_group()
> else
> find_idlest_group();
> if (!group) {
> sd = sd->child;
> continue;
> }
> ...
> }
> }
>
> sub proposal:
> 1, If it's possible to balance task on idlest cpu not appointed 'balance
> cpu'. If so, it may can reduce one more time balancing.
> The idlest cpu can prefer the new idle cpu; and is the least load cpu;
> 2, se or task load is good for running time setting.
> but it should the second basis in load balancing. The first basis of LB
> is running tasks' number in group/cpu. Since whatever of the weight of
> groups is, if the tasks number is less than cpu number, the group is
> still has capacity to take more tasks. (will consider the SMT cpu power
> or other big/little cpu capacity on ARM.)
>
> unsolved issues:
> 1, like current scheduler, it didn't handled cpu affinity well in
> load_balance.
> 2, task group that isn't consider well in this rough proposal.
>
> It isn't consider well and may has mistaken . So just share my ideas and
> hope it become better and workable in your comments and discussion.
>
> Thanks
> Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/