[4/11] issue 4: Tracking idle states

From: Morten Rasmussen
Date: Tue Jan 07 2014 - 11:22:08 EST


Similar to the issue of knowing the potential capacity of a cpu, the CFS
scheduler also needs to know the idle state of idle cpus. Currently, an
idle cpu is found using cpumask_first() when an extra cpu is needed (for
nohz_idle_balance in find_new_ilb() in sched/fair.c). The energy
trade-off whether to wake another cpu or put tasks on already busy cpus
depend on this information.

The cost of waking up a cpu in terms of latency and energy depends on
the idle state the cpu is in. Deeper idle states typically affects more
than a single cpu. Waking up a single cpu from such state is more
expensive as it also affects the idle states of of its related cpus.

Energy costs are not currently represented in the cpuidle framework, but
latency is. Taking ARM TC2 as an example [1], which has two idle states:
Per-core clock-gating (WFI), and cluster power-down (power down all
related cpus and caches). The target residencies and exit latencies
specified in the driver give an idea about the cost involved in
entering/exiting these states.

Target Exit
residency latency
Clock-gating (WFI) 1 1
Cluster power-down 2000/2500 500/700 (big/LITTLE)

Picking the cheapest idle cpu would also have the effect that wake-ups
are likely to happen on the same cpu and leave the remaining cpus in
idle for longer.

Potential solution: Make the scheduler idle state aware by either moving
idle handling into the scheduler or let the idle framework (cpuidle)
maintain a cpumask of the cheapest cpus to wake up which is accessible
to the scheduler.

[1] drivers/cpuidle/cpuidle-big_little.c

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/