Re: [PATCH 3/3] sched: terminate newidle balancing once at least one task has moved over

From: Nick Piggin
Date: Mon Jun 23 2008 - 21:47:36 EST


On Tuesday 24 June 2008 12:39, Gregory Haskins wrote:
> Hi Nick,
>
> >>> On Mon, Jun 23, 2008 at 8:50 PM, in message
>
> <200806241050.12028.nickpiggin@xxxxxxxxxxxx>, Nick Piggin
>
> <nickpiggin@xxxxxxxxxxxx> wrote:
> > On Tuesday 24 June 2008 09:04, Gregory Haskins wrote:
> >> Inspired by Peter Zijlstra.
> >
> > Is this really getting tested well? Because at least for SCHED_OTHER
> > tasks,
>
> Note that this only affects SCHED_OTHER. RT tasks are handled with a
> different algorithm.
>
> > the newidle balancer is still supposed to be relatively
> > conservative and not over balance too much.
>
> In our testing, newidle is degrading the system (at least for certain
> workloads). Oprofile was showing that newidle can account for 60-80% of
> the CPU during our benchmark runs. Turning off newidle *completely* by
> commenting out idle_balance() boosts netperf performance by 200% for our
> 8-core to 8-core UDP transaction test. Obviously neutering it is not
> sustainable as a general solution, so we are trying to reduce its negative
> impact.

Hmm. I'd like to see an attempt to be made to tuning the algorithm
so that newidle actually won't cause any tasks to be balanced in
this case. That seems like the right thing to do, doesn't it?

Of course... tuning the whole balancer on the basis of a crazy
netperf benchmark is... dangerous :)


> It is not clear whether the problem is that newidle is over-balancing the
> system, or that newidle is simply running too frequently as a symptom of a
> system that has a high frequency of context switching (such as -rt). I
> suspected the latter, so I was attracted to Peter's idea based on the
> concept of shortening the time we execute this function. But I have to
> admit, unlike 1/3 and 2/3 which I have carefully benchmarked individually
> and know make a positive performance impact, I pulled this in more on
> theory. I will try to benchmark this individually as well.
>
> > By the time you have
> > done all this calculation and reached here, it will be a loss to only
> > move one task if you could have moved two and halved your newidle
> > balance rate...
>
> Thats an interesting point that I did not consider, but note that a very
> significant chunk of the overhead I believe comes from the
> double_lock/move_tasks code after the algorithmic complexity is completed.

And that double lock will be amortized if you can move 2 tasks at once,
rather than 1 task each 2 times.


> I believe the primary motivation of this patch is related to reducing the
> overall latency in the schedule() critical section. Currently this
> operation can perform an unbounded move_task operation in a
> preempt-disabled region (which, as an aside, is always SCHED_OTHER
> related).

Maybe putting some upper cap on it, I could live with. Cutting off at
one task I think needs a lot more thought and testing.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/