Re: [PATCH] sched: Resolve sd_idle and first_idle_cpu Catch-22 - v1

From: Suresh Siddha
Date: Fri Feb 11 2011 - 20:20:24 EST


On Wed, 2011-02-09 at 07:55 -0800, Peter Zijlstra wrote:
> On Mon, 2011-02-07 at 11:53 -0800, Suresh Siddha wrote:
> >
> > Peter, to answer your question of why SMT is treated different to cores
> > sharing cache, performance improvements contributed by SMT is far less
> > compared to the cores and any wrong decisions in SMT load balancing
> > (especially in the presence of idle cores, packages) has a bigger
> > impact.
> >
> > I think in the tbench case referred by Nick, idle HT siblings in a busy
> > package picked the load instead of the idle packages. And thus we
> > probably had to wait for active load balance to kick in to distribute
> > the load etc by which the damage would have been. Performance impact of
> > this condition wouldn't be as severe in the cores sharing last level
> > cache and other resources.
> >
> > Also there are lot of changes in this area since 2005. So it would be
> > nice to revisit the tbench case and see if the logic of propagating busy
> > sibling status to the higher level load balances is still needed or not.
> >
> > On the contrary, perhaps there might be some workloads which may benefit
> > in performance/latency if we completely do away with this less
> > aggressive SMT load balancing.
>
> Right, but our current capacity logic does exactly that and seems to
> work for more than 2 smt siblings (it does the whole asymmetric power7
> muck).
>
> From a quick glance at the sched.c state at the time of Nick's patch,
> the capacity logic wasn't around then.

Yes Peter. We have lot more logic now which is trying to predict the
imbalance between the groups more accurately.

>
> So I see no reason what so ever to keep this SMT exception.

I am also ok with removing this code. But as Venki mentioned earlier
(http://marc.info/?l=linux-kernel&m=129735866732171&w=2), we need to
make sure idle core gets priority instead of an idle smt-thread on a
busy core while pulling the load from the busiest socket.

I requested Venki to post these 2 patches of removing the propagation of
busy sibling status to an idle sibling and prioritizing the idle core
while pulling the load. I will request Alex and Tim to run their
performance workloads to make sure that this doesn't show any
regressions.

thanks,
suresh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/