Re: [RFCv3 PATCH 33/48] sched: Energy-aware wake-up task placement

From: Juri Lelli
Date: Wed Mar 25 2015 - 14:07:17 EST


Hi Peter,

On 24/03/15 16:35, Peter Zijlstra wrote:
> On Wed, Feb 04, 2015 at 06:31:10PM +0000, Morten Rasmussen wrote:
>> +static int energy_aware_wake_cpu(struct task_struct *p)
>> +{
>> + struct sched_domain *sd;
>> + struct sched_group *sg, *sg_target;
>> + int target_max_cap = SCHED_CAPACITY_SCALE;
>> + int target_cpu = task_cpu(p);
>> + int i;
>> +
>> + sd = rcu_dereference(per_cpu(sd_ea, task_cpu(p)));
>> +
>> + if (!sd)
>> + return -1;
>> +
>> + sg = sd->groups;
>> + sg_target = sg;
>> + /* Find group with sufficient capacity */
>> + do {
>> + int sg_max_capacity = group_max_capacity(sg);
>> +
>> + if (sg_max_capacity >= task_utilization(p) &&
>> + sg_max_capacity <= target_max_cap) {
>> + sg_target = sg;
>> + target_max_cap = sg_max_capacity;
>> + }
>> + } while (sg = sg->next, sg != sd->groups);
>> +
>> + /* Find cpu with sufficient capacity */
>> + for_each_cpu_and(i, tsk_cpus_allowed(p), sched_group_cpus(sg_target)) {
>> + int new_usage = get_cpu_usage(i) + task_utilization(p);
>> +
>> + if (new_usage > capacity_orig_of(i))
>> + continue;
>> +
>> + if (new_usage < capacity_curr_of(i)) {
>> + target_cpu = i;
>> + if (!cpu_rq(i)->nr_running)
>> + break;
>> + }
>> +
>> + /* cpu has capacity at higher OPP, keep it as fallback */
>> + if (target_cpu == task_cpu(p))
>> + target_cpu = i;
>> + }
>> +
>> + if (target_cpu != task_cpu(p)) {
>> + struct energy_env eenv = {
>> + .usage_delta = task_utilization(p),
>> + .src_cpu = task_cpu(p),
>> + .dst_cpu = target_cpu,
>> + };
>> +
>> + /* Not enough spare capacity on previous cpu */
>> + if (cpu_overutilized(task_cpu(p), sd))
>> + return target_cpu;
>> +
>> + if (energy_diff(&eenv) >= 0)
>> + return task_cpu(p);
>> + }
>> +
>> + return target_cpu;
>> +}

Mike kept working on this since last LPC discussion, and I could
spend some cycles on this thing too lately, reviewing/discussing
wip with him. So, I guess I'll jump into this :).

>
> So while you have some cpufreq -> sched coupling (the capacity_curr
> thing) this would be the site where you could provide sched -> cpufreq
> coupling, right?
>

Yes and no, IMHO. It makes perfect sense to trigger cpufreq on the
target_cpu's freq domain, as we know that we are going to add p's
utilization there. Anyway, I was thinking that we could just
rely on triggering points in {en,de}queue_task_fair and task_tick_fair.
We end up calling one of them every time we wake-up a task, perform
a load balancing decision or just while running the task itself
(we have to react to tasks phase changes). This way we should be
able to reduce the number of triggering points and be more general
at the same time.

> So does it make sense to at least put in the right hooks now? I realize
> we'll likely take cpufreq out back and feed it to the bears but
> something managing P states will be there whatever we'll call the new
> fangled thing and this would be the place to hook it still.
>

We should be able to clean up and post something along this line
fairly soon.

Best,

- Juri

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/