On Tue, 2013-12-10 at 12:30 +0100, Daniel Lezcano wrote:Hi All,
I am trying to understand how is computed the idle_avg and how it is
used regarding the migration latency.
1. What is the sysctl_sched_migration_cost value ? It is initialized to
500000UL. Is it an arbitrarily chosen value ? Could it change depending
on the hardware performances ?
Yeah, it's a magic number. We used to use boot time measurements.
2. The idle_balance function checks:
if (this_rq->avg_idle < sysctl_sched_migration_cost)
return 0;
IIUC, it is not worth to migrate a task to this cpu as we expect to run
another task before we can pull a task to the current cpu, right ?
No, that's all about not beating living hell outta ourselves on every
micro-idle. As with all load balancing, it's usually too much balancing
that creates a problem. You need it, but it's really expensive, so less
is more.
Then if there is no task to balance we will enter idle, thus we
initialize the idle_stamp to the current clock.
When another task is woken up with the ttwu_do_wakeup, the duration of
the idle time is computed in there:
if (rq->idle_stamp) {
u64 delta = rq_clock(rq) - rq->idle_stamp;
u64 max = 2*sysctl_sched_migration_cost;
if (delta > max)
rq->avg_idle = max;
else
update_avg(&rq->avg_idle, delta);
rq->idle_stamp = 0;
}
Why is the 'delta' leveraged by 'max' ?
That has changed a little recently. I originally slammed avg_idle
itself straight to max to ensure that a bursty load would idle balance,
and not use stale data. If you start cross core switching at high
frequency, you'll still shut idle balancing quickly.
3. And finally the function update_avg does:
s64 diff = sample - *avg;
*avg += diff >> 3;
Why is diff >> 3 used instead of the number of values ?
Ingo's quick like bunny smooth average.