Re: [PATCH] sched,numa: do not move past the balance point if unbalanced

From: Peter Zijlstra
Date: Thu Jan 15 2015 - 05:45:26 EST


On Mon, Jan 12, 2015 at 04:30:39PM -0500, Rik van Riel wrote:
> There is a subtle interaction between the logic introduced in commit
> e63da03639cc9e6e83b62e7ef8ffdbb92421416a, the way the load balancer

e63da03639cc ("sched/numa: Allow task switch if load imbalance improves")

I have the below in my .gitconfig to easily create them things:

[core]
abbrev = 12
[alias]
one = show -s --pretty='format:%h (\"%s\")'

> counts the load on each NUMA node, and the way NUMA hinting faults are
> done.
>
> Specifically, the load balancer only counts currently running tasks
> in the load, while NUMA hinting faults may cause tasks to stop, if
> the page is locked by another task.
>
> This could cause all of the threads of a large single instance workload,
> like SPECjbb2005, to migrate to the same NUMA node. This was possible
> because occasionally they all fault on the same few pages, and only one
> of the threads remains runnable. That thread can move to the process's
> preferred NUMA node without making the imbalance worse, because nothing
> else is running at that time.
>
> The fix is to check the direction of the net moving of load, and to
> refuse a NUMA move if it would cause the system to move past the point
> of balance. In an unbalanced state, only moves that bring us closer
> to the balance point are allowed.

Did you also test with whatever load needed the previous thing? Its far
too easy to fix one and break the other in my experience ;-)

> orig_src_load = env->src_stats.load;
> - orig_dst_load = env->dst_stats.load;
>
> - if (orig_dst_load < orig_src_load)
> - swap(orig_dst_load, orig_src_load);
> -
> - old_imb = orig_dst_load * src_capacity * 100 -
> - orig_src_load * dst_capacity * env->imbalance_pct;
> + /*
> + * In a task swap, there will be one load moving from src to dst,
> + * and another moving back. This is the net sum of both moves.
> + * Allow the move if it brings the system closer to a balanced
> + * situation, without crossing over the balance point.
> + */

This comment seems to 'forget' about the !swap moves?

> + moved_load = orig_src_load - src_load;
>
> - /* Would this change make things worse? */
> - return (imb > old_imb);
> + if (moved_load > 0)
> + /* Moving src -> dst. Did we overshoot balance? */
> + return src_load < dst_load;

So here we inhibit movement when the src cpu gained (moved_load > 0) and
src was smaller than dst.

However there is no check the new src value is in fact bigger than dst;
src could have gained and still be smaller. And afaict that's a valid
move, even under the proposed semantics, right?

> + else
> + /* Moving dst -> src. Did we overshoot balance? */
> + return dst_load < src_load;

And vs.

> }

One should use capacity muck when comparing load between CPUs.

In any case, brain hurt.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/