Re: [patch v6 12/21] sched: add power aware scheduling in fork/exec/wake

From: Preeti U Murthy
Date: Mon Apr 01 2013 - 05:52:38 EST


Hi Alex,

The below patch has an issue in the select_task_rq_fair(), which I have mentioned in
the fix for it below.

On 03/30/2013 08:04 PM, Alex Shi wrote:
> This patch add power aware scheduling in fork/exec/wake. It try to
> select cpu from the busiest while still has utilization group. That's
> will save power since it leaves more groups idle in system.
>
> The trade off is adding a power aware statistics collection in group
> seeking. But since the collection just happened in power scheduling
> eligible condition, the worst case of hackbench testing just drops
> about 2% with powersaving policy. No clear change for performance
> policy.
>
> The main function in this patch is get_cpu_for_power_policy(), that
> will try to get a idlest cpu from the busiest while still has
> utilization group, if the system is using power aware policy and
> has such group.
>
> Signed-off-by: Alex Shi <alex.shi@xxxxxxxxx>
> ---
> kernel/sched/fair.c | 109 +++++++++++++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 103 insertions(+), 6 deletions(-)
+/*
> + * select_task_rq_fair: balance the current task (running on cpu) in domains
> * that have the 'flag' flag set. In practice, this is SD_BALANCE_FORK and
> * SD_BALANCE_EXEC.
> *
> - * Balance, ie. select the least loaded group.
> - *
> * Returns the target CPU number, or the same CPU if no balancing is needed.
> *
> * preempt must be disabled.
> */
> static int
> -select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags)
> +select_task_rq_fair(struct task_struct *p, int sd_flag, int flags)
> {
> struct sched_domain *tmp, *affine_sd = NULL, *sd = NULL;
> int cpu = smp_processor_id();
> int prev_cpu = task_cpu(p);
> int new_cpu = cpu;
> int want_affine = 0;
> - int sync = wake_flags & WF_SYNC;
> + int sync = flags & WF_SYNC;
> + struct sd_lb_stats sds;
>
> if (p->nr_cpus_allowed == 1)
> return prev_cpu;
> @@ -3412,11 +3500,20 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags)
> break;
> }
>
> - if (tmp->flags & sd_flag)
> + if (tmp->flags & sd_flag) {
> sd = tmp;
> +
> + new_cpu = get_cpu_for_power_policy(sd, cpu, p, &sds);
> + if (new_cpu != -1)
> + goto unlock;
> + }
> }
>
> if (affine_sd) {
> + new_cpu = get_cpu_for_power_policy(affine_sd, cpu, p, &sds);
> + if (new_cpu != -1)
> + goto unlock;
> +
> if (cpu != prev_cpu && wake_affine(affine_sd, p, sync))
> prev_cpu = cpu;
>


sched: Fix power aware scheduling in fork/wake/exec

From: Preeti U Murthy <preeti@xxxxxxxxxxxxxxxxxx>

Problem:

select_task_rq_fair() returns a target CPU/ waking CPU if no balancing is
required. However with the current power aware scheduling in this path, an
invalid CPU might be returned.

If get_cpu_for_power_policy() fails to find a new_cpu for the forked task,
then there is a possibility that the new_cpu could continue to be -1, till
the end of the select_task_rq_fair() if the search for a new cpu ahead in
this function also fails. Since this scenario is unexpected by
the callers of select_task_rq_fair(),this needs to be fixed.

Fix:

Do not intermix the variables meant to reflect the target CPU of power save
and performance policies. If the target CPU of powersave is successful in being
found, return it. Else allow the performance policy to take a call on the
target CPU.

The above scenario was caught when a kernel crash resulted with a bad data access
interrupt, during a kernbench run on a 2 socket,16 core machine,with each core
having SMT-4
---
kernel/sched/fair.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 56b96a7..54d4400 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3532,6 +3532,7 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int flags)
int cpu = smp_processor_id();
int prev_cpu = task_cpu(p);
int new_cpu = cpu;
+ int new_cpu_for_power = -1;
int want_affine = 0;
int sync = flags & WF_SYNC;
struct sd_lb_stats sds;
@@ -3563,16 +3564,16 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int flags)
if (tmp->flags & sd_flag) {
sd = tmp;

- new_cpu = get_cpu_for_power_policy(sd, cpu, p, &sds,
+ new_cpu_for_power = get_cpu_for_power_policy(sd, cpu, p, &sds,
flags & SD_BALANCE_FORK);
- if (new_cpu != -1)
+ if (new_cpu_for_power != -1)
goto unlock;
}
}

if (affine_sd) {
- new_cpu = get_cpu_for_power_policy(affine_sd, cpu, p, &sds, 0);
- if (new_cpu != -1)
+ new_cpu_for_power = get_cpu_for_power_policy(affine_sd, cpu, p, &sds, 0);
+ if (new_cpu_for_power != -1)
goto unlock;

if (cpu != prev_cpu && wake_affine(affine_sd, p, sync))
@@ -3622,8 +3623,10 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int flags)
}
unlock:
rcu_read_unlock();
+ if (new_cpu_for_power == -1)
+ return new_cpu;

- return new_cpu;
+ return new_cpu_for_power;
}

/*

Regards
Preeti U Murthy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/