Re: [RFC][PATCH 14/26] sched, numa: Numa balancer

From: Rik van Riel
Date: Thu Jul 12 2012 - 18:04:44 EST

Next message: Andrew Morton: "Re: [PATCH] fork: fix error handling in dup_task()"
Previous message: Sami Liedes: "kmemcheck: Fatal error; system fails to boot when kmemcheck enabled"
In reply to: Rik van Riel: "Re: [RFC][PATCH 14/26] sched, numa: Numa balancer"
Next in thread: Don Morris: "Re: [RFC][PATCH 14/26] sched, numa: Numa balancer"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 03/16/2012 10:40 AM, Peter Zijlstra wrote:

At LSF/MM, there was a presentation comparing Peter's
NUMA code with Andrea's NUMA code. I believe this is
the main reason why Andrea's code performed better in
that particular test...

+ if (sched_feat(NUMA_BALANCE_FILTER)) {
+ /*
+ * Avoid moving ne's when we create a larger imbalance
+ * on the other end.
+ */
+ if ((imb->type & NUMA_BALANCE_CPU) &&
+ imb->cpu - cpu_moved < ne_cpu / 2)
+ goto next;
+
+ /*
+ * Avoid migrating ne's when we'll know we'll push our
+ * node over the memory limit.
+ */
+ if (max_mem_load &&
+ imb->mem_load + mem_moved + ne_mem > max_mem_load)
+ goto next;
+ }

IIRC the test consisted of a 16GB NUMA system with two 8GB nodes.
It was running 3 KVM guests, two guests of 3GB memory each, and
one guest of 6GB each.

With autonuma, the 6GB guest ended up on one node, and the
3GB guests on the other.

With sched numa, each node had a 3GB guest, and part of the 6GB guest.

There is a fundamental difference in the balancing between autonuma
and sched numa.

In sched numa, a process is moved over to the current node only if
the current node has space for it.

Autonuma, on the other hand, operates more of a a "hostage exchange"
policy, where a thread on one node is exchanged with a thread on
another node, if it looks like that will reduce the overall number
of cross-node NUMA faults in the system.

I am not sure how to do a "hostage exchange" algorithm with
sched numa, but it would seem like it could be necessary in order
for some workloads to converge on a sane configuration.

After all, with only about 2GB free on each node, you will never
get to move either a 3GB guest, or parts of a 6GB guest...

Any ideas?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Andrew Morton: "Re: [PATCH] fork: fix error handling in dup_task()"
Previous message: Sami Liedes: "kmemcheck: Fatal error; system fails to boot when kmemcheck enabled"
In reply to: Rik van Riel: "Re: [RFC][PATCH 14/26] sched, numa: Numa balancer"
Next in thread: Don Morris: "Re: [RFC][PATCH 14/26] sched, numa: Numa balancer"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]