Re: [RFC PATCH] sched: find the latest idle cpu

From: Daniel Lezcano
Date: Thu Jan 16 2014 - 06:03:24 EST

On 01/15/2014 03:37 PM, Alex Shi wrote:
On 01/15/2014 03:35 PM, Peter Zijlstra wrote:
On Wed, Jan 15, 2014 at 12:07:59PM +0800, Alex Shi wrote:
Currently we just try to find least load cpu. If some cpus idled,
we just pick the first cpu in cpu mask.

In fact we can get the interrupted idle cpu or the latest idled cpu,
then we may get the benefit from both latency and power.
The selected cpu maybe not the best, since other cpu may be interrupted
during our selecting. But be captious costs too much.

No, we should not do anything like this without first integrating

At which point we have a sane view of the idle states and can make a
sane choice between them.


Any comments to make it better?

Hi Alex,

it is a nice optimization attempt but I agree with Peter we should focus on integrating cpuidle.

The question is "how do we integrate cpuidle ?"

IMHO, the main problem are the governors, especially the menu governor.

The menu governor tries to predict the events per cpu. This approach which gave us a nice benefit for the power saving may not fit well for the scheduler.

I think we can classify the events in three categories:

1. fully predictable (timers)
2. partially predictable (eg. MMC, sdd or network)
3. unpredictable (eg. keyboard, network ingress after quiescent period)

The menu governor mix 2 and 3 with statistics and a performance multiplier to reach shallow states based on heuristic and experimentation for a specific platform.

I was wondering if we shouldn't create a per task io latency tracking.

Mostly based on io_schedule and io_schedule_timeout, we track the latency for each task for each device, keeping up to date a rb-tree where the left-most leaf is the minimum latency for all the tasks running on a specific cpu. That allows better tracking when moving tasks across cpus.

With this approach, we have something consistent with the per load task tracking.

This io latency tracking gives us the next wake up event we can inject to the cpuidle framework directly. That removes all the code related to the menu governor statistics based on IO events and simplify a lot the menu governor code. So we replaced a piece of the cpuidle code by a scheduler code which I hope could be better for prediction, leading to a part of integration.

In order to finish integrating the cpuidle framework in the scheduler, there are pending questions about the impact in the current design.

Peter or Ingo, if you have time, could you have a look at the email I sent previously [1] ?


-- Daniel


<> â Open source software for ARM SoCs

Follow Linaro: <> Facebook |
<!/linaroorg> Twitter |
<> Blog

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at