Re: [RFCv3 PATCH 33/48] sched: Energy-aware wake-up task placement

From: Morten Rasmussen
Date: Fri Mar 27 2015 - 12:37:09 EST


On Wed, Mar 18, 2015 at 08:15:59PM +0000, Sai Gurrappadi wrote:
> On 03/16/2015 07:47 AM, Morten Rasmussen wrote:
> > Again you are right. We could make the + task_utilization(p) conditional
> > on i != task_cpu(p). One argument against doing that is that in
> > select_task_rq_fair() task_utilization(p) hasn't been decayed yet while
> > it blocked load on the previous cpu (rq) has. If the task has been gone
> > for a long time, its blocked contribution may have decayed to zero and
> > therefore be a poor estimate of the utilization increase caused by
> > putting the task back on the previous cpu. Particularly if we still use
> > the non-decayed task_utilization(p) to estimate the utilization increase
> > on other cpus (!task_cpu(p)). In the interest of responsiveness and not
> > trying to squeeze tasks back onto the previous cpu which might soon run
> > out of capacity when utilization increases we could leave it as a sort
> > of performance bias.
> >
> > In any case it deserves a comment in the code I think.
>
> I think it makes sense to use the non-decayed value of the the task's
> contrib. on wake but I am not sure if we should do this 2x accounting
> all the time.

If we could just find a way to remove the blocked load contribution and
only use the non-decayed value. I'll have a look and see if I can do
better.

> Another slightly related issue is that NOHZ could cause blocked rq sums
> to remain stale for long periods if there aren't frequent enough
> idle/nohz-idle-balances. This would cause the above bit and
> energy_diff() to compute incorrect values.

I have looked into load tracking behaviour when cpus are in nohz idle.
It is not easy to fix properly. You will either need to put the burden
of updating the blocked load of the nohz-idle cpu on one of the non-idle
cpus and thereby spend precious cycles on busy cpus, or make sure to
kick a nohz-idle cpu to do the updates on a regular basis.

I am experimenting a bit with a third option which is to 'pre-decay' the
blocked load/usage when a cpu enters nohz-idle based on the nohz-idle
predicted period of idle. When the cpu exits nohz-idle I swap the
non-decayed blocked back in so it get decayed properly as if the no
pre-decay had happened. If some other cpu running nohz_idle_balance()
decides to update the blocked load the original is swapped back in as
well. It isn't bulletproof as nohz_idle_balance() updates from other
cpus ruins the pre-decay and prediction used for pre-decay might be
wrong. So I'm not really convinced if it is the right way to go.

Any better ideas?

NOHZ full (tickless busy) is a nightmare for accurate load-tracking that
I don't want to face right now.

Morten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/