Re: [PATCH V2] sched: Improve load balancing in the presence of idle CPUs

From: Morten Rasmussen
Date: Mon Mar 30 2015 - 11:26:27 EST


On Mon, Mar 30, 2015 at 02:29:09PM +0100, Vincent Guittot wrote:
> On 30 March 2015 at 14:24, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > On Mon, Mar 30, 2015 at 01:03:03PM +0100, Morten Rasmussen wrote:
> >> On Mon, Mar 30, 2015 at 12:06:32PM +0100, Peter Zijlstra wrote:
> >> > On Fri, Mar 27, 2015 at 05:56:51PM +0000, Morten Rasmussen wrote:
> >> >
> >> > > I agree that it is hard to predict how many additional cpus you need,
> >> > > but I don't think you necessarily need that information as long as you
> >> > > start by filling up the cpu that was kicked to do the
> >> > > nohz_idle_balance() first.
> >> >
> >> > > Reducing unnecessary wakeups is quite important for energy consumption
> >> > > and something a lot of effort is put into. You really don't want to wake
> >> > > up another cluster/package unnecessarily just because there was only one
> >> > > nohz-idle cpu left in the previous one which could have handled the
> >> > > additional load. It gets even worse if the other cluster is less
> >> > > energy-efficient (big.LITTLE).
> >> >
> >> > So the only way to get tasks to cross your cluster is by balancing that
> >> > domain. At this point we'll compute sg stats for either group
> >> > (=cluster).
> >> >
> >> > The only thing we need to ensure is that it doesn't view the small
> >> > cluster as overloaded (as long as it really isn't of course), as long as
> >> > its not viewed as overloaded it will not pull tasks from it into the big
> >> > cluster, no matter how many ILBs we run before the ILB duty cpu's
> >> > rebalance_domains() call.
> >> >
> >> > I'm really not seeing the problem here.
> >>
> >> I see. The group_classify() should take care of it in all cases of
> >> balancing across clusters. You would be iterating over all cpus in the
> >> other cluster running rebalance_domains() if the balancer cpu happens to
> >> be the last one in the little cluster though. However, within the
> >> cluster (in case you have 2 or more nohz-idle cpus) you still take a
> >> double hit. No?
> >
> > It can yes, but typically not I think. This all could use some 'help'
> > for sure.
> >
> > So the thing is, find_new_ilb() simply selects the first idle_cpus_mask
> > cpu, while at the same time, nohz_idle_balance() will iterate the
> > idle_cpus_mask with the first, being first (obviously).
> >
> > So it is very like that if we migrate on the ILB it is in fact to the
> > current CPU.
> >
> > In case we cannot, we have no choice but to wake up a second idle,
> > nothing really to be done about that.
> >
> > To put it another way, for ILB purposes the rebalance_domains() call is
> > mostly superfluous. The only other case is if the selected ILB target
> > became non-idle between being selected and getting to run the softirq
> > handler. At which point we should wake another anyhow too.

Apart from the issues that Vincent has already pointed out, including
the balancing of this_cpu in nohz_idle_balance() also means that we are
no longer ignoring rq->next_balance for the cpu being kicked. So we risk
kicking a cpu to do nohz_idle_balance which might ignore the fact that
we need an ILB (the kicker should have determined that already) due to
having done a load-balance recently.

Also, if we include this_cpu in nohz_idle_balance() and pick the first
nohz-idle cpu (as we currently do) we are back to where we were before
Preeti's patch. The balancer cpu will bail out if it pulls anything for
itself and not kick anybody to take over.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/