Re: [PATCH RFC/TEST] sched: make sync affine wakeups work

From: Rik van Riel
Date: Mon May 05 2014 - 07:29:23 EST


On 05/05/2014 12:50 AM, Preeti U Murthy wrote:

> Yeah now I see it. But I still feel wake_affine() and
> select_idle_sibling() are not at fault primarily because when they were
> introduced, I don't think it was foreseen that the cpu topology would
> grow to the extent it is now.

It's not about "fault", it is about the fact that on current
large NUMA systems they are broken, and could stand some
improvement :)

> select_idle_sibling() for instance scans the cpus within the purview of
> the last level cache of a cpu and this was a small set. Hence there was
> no overhead. Now with many cpus sharing the L3 cache, we see an
> overhead. wake_affine() probably did not expect the NUMA nodes to come
> under its governance as well and hence it sees no harm in waking up
> tasks close to the waker because it still believes that it will be
> within a node.

If two tasks truly are related to each other, I think we
will want to have the wake_affine logic pull them towards
each other, all the way across a giant NUMA system if
needs be.

The problem is that the current wake_affine logic starts
in the ON position, and only switches off in a few very
specific scenarios.

I suspect we would be better off with the reverse, starting
with wake_affine in the off position, and switching it on
when we detect it makes sense to do so.

--
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/