Re: [PATCH 3/3] sched, fair: Stop searching for tasks in newidle balance if there are runnable tasks

From: Peter Zijlstra
Date: Thu Apr 24 2014 - 10:59:42 EST


On Thu, Apr 24, 2014 at 03:08:02PM +0100, Morten Rasmussen wrote:
> On Thu, Apr 24, 2014 at 12:32:31PM +0100, Peter Zijlstra wrote:
> > On Thu, Apr 24, 2014 at 11:30:59AM +0100, Morten Rasmussen wrote:
> > > On Thu, Apr 24, 2014 at 02:30:35AM +0100, Jason Low wrote:
> > > > @@ -6704,7 +6703,12 @@ static int idle_balance(struct rq *this_rq)
> > > > interval = msecs_to_jiffies(sd->balance_interval);
> > > > if (time_after(next_balance, sd->last_balance + interval))
> > > > next_balance = sd->last_balance + interval;
> > > > - if (pulled_task)
> > > > +
> > > > + /*
> > > > + * Stop searching for tasks to pull if there are
> > > > + * now runnable tasks on this rq.
> > > > + */
> > > > + if (pulled_task || this_rq->nr_running > 0)
> > >
> > > Should this be cfs tasks instead?
> > >
> > > + if (pulled_task || this_rq->cfs.h_nr_running > 0)
> > >
> > > 3.15-rc2 commit 35805ff8f4fc535ac85330170d3c56829c87c677 seems to
> > > indicate that using rq->nr_running may lead to trouble.
> > >
> > > The other two patches look good to me.
> >
> > No, this really wants to be nr_running, we want to bail the idle
> > balancer when there's anything runnable present.
> >
> > Note how out: is very careful to return -1 (which results in RETRY_TASK)
> > when rq->nr_running != rq->cfs.h_nr_running.
> >
> > That same out: test also makes problem that commit fixes impossible
> > again.
>
> I should have done my homework properly. I may be missing something, but
> don't we risk bailing out of idle balance if there is a throttled rt
> task and go straight to idle?

Good point, depends, in tip/sched/core Kirill fixed that. For -linus
this is indeed a problem.

See patches:

46383648b3c7 sched: Revert commit 4c6c4e38c4e9 ("sched/core: Fix endless loop in pick_next_task()")
f4ebcbc0d7e0 sched/rt: Substract number of tasks of throttled queues from rq->nr_running
653d07a6989a sched/rt: Add accessors rq_of_rt_se()
22abdef37ceb sched/rt: Sum number of all children tasks in hierarhy at ->rt_nr_running

So for sched/core everything should be fine. This does mean I have to
queue this patch for /core and not /urgent (/me quickly moves).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/