Re: [RFC] sched: Limit idle_balance() when it is being used toofrequently

From: Jason Low
Date: Thu Jul 18 2013 - 00:02:36 EST


On Wed, 2013-07-17 at 20:01 +0200, Peter Zijlstra wrote:
> On Wed, Jul 17, 2013 at 01:51:51PM -0400, Rik van Riel wrote:
> > On 07/17/2013 12:18 PM, Peter Zijlstra wrote:
>
> > >So the way I see things is that the only way newidle balance can slow down
> > >things is if it runs when we could have ran something useful.
> >
> > Due to contention on the runqueue locks of other CPUs,
> > newidle also has the potential to keep _others_ from
> > running something useful.
>
> Right, although that should only happen when we do have an imbalance and want
> to go move something. Which in Jason's case is 'rare'. But yes, I suppose
> there's other scenarios where this is far more likely.
>
> > Could we prevent that downside by measuring both the
> > time spent idle, and the time spent in idle balancing,
> > and making sure the idle balancing time never exceeds
> > more than N% of the idle time?
>
> Sure:
>
> idle_balance(u64 idle_duration)
> {
> u64 cost = 0;
>
> for_each_domain(sd) {
> if (cost + sd->cost > idle_duration/N)
> break;
>
> ...
>
> sd->cost = (sd->cost + this_cost) / 2;
> cost += this_cost;
> }
> }
>
> I would've initially suggested using something like N=2 since we're dealing
> with averages and half should ensure we don't run over except for the worst
> peaks. But we could easily use a bigger N.

I ran a few AIM7 workloads for the 8 socket HT enabled case and I needed
to set N to more than 20 in order to get the big performance gains.

One thing that I thought of was to have N be based on how often idle
balance attempts does not pull task(s).

For example, N can be calculated based on the number of idle balance
attempts for the CPU since the last "successful" idle balance attempt.
So if the previous 30 idle balance attempts resulted in no tasks moved,
then n = 30 / 5. So idle balance gets less time to run as the number of
unneeded idle balance attempts increases, and thus N will not be set too
high during situations where idle balancing is "successful" more often.
Any comments on this idea?

Thanks,
Jason


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/