Re: [RFC][PATCH 14/26] sched, numa: Numa balancer

From: Don Morris
Date: Fri Jul 13 2012 - 10:45:22 EST


On 07/12/2012 03:02 PM, Rik van Riel wrote:
> On 03/16/2012 10:40 AM, Peter Zijlstra wrote:
>
> At LSF/MM, there was a presentation comparing Peter's
> NUMA code with Andrea's NUMA code. I believe this is
> the main reason why Andrea's code performed better in
> that particular test...
>
>> + if (sched_feat(NUMA_BALANCE_FILTER)) {
>> + /*
>> + * Avoid moving ne's when we create a larger imbalance
>> + * on the other end.
>> + */
>> + if ((imb->type & NUMA_BALANCE_CPU) &&
>> + imb->cpu - cpu_moved < ne_cpu / 2)
>> + goto next;
>> +
>> + /*
>> + * Avoid migrating ne's when we'll know we'll push our
>> + * node over the memory limit.
>> + */
>> + if (max_mem_load &&
>> + imb->mem_load + mem_moved + ne_mem > max_mem_load)
>> + goto next;
>> + }
>
> IIRC the test consisted of a 16GB NUMA system with two 8GB nodes.
> It was running 3 KVM guests, two guests of 3GB memory each, and
> one guest of 6GB each.

How many cpus per guest (host threads) and how many physical/logical
cpus per node on the host? Any comparisons with a situation where
the memory would fit within nodes but the scheduling load would
be too high?

Don

>
> With autonuma, the 6GB guest ended up on one node, and the
> 3GB guests on the other.
>
> With sched numa, each node had a 3GB guest, and part of the 6GB guest.
>
> There is a fundamental difference in the balancing between autonuma
> and sched numa.
>
> In sched numa, a process is moved over to the current node only if
> the current node has space for it.
>
> Autonuma, on the other hand, operates more of a a "hostage exchange"
> policy, where a thread on one node is exchanged with a thread on
> another node, if it looks like that will reduce the overall number
> of cross-node NUMA faults in the system.
>
> I am not sure how to do a "hostage exchange" algorithm with
> sched numa, but it would seem like it could be necessary in order
> for some workloads to converge on a sane configuration.
>
> After all, with only about 2GB free on each node, you will never
> get to move either a 3GB guest, or parts of a 6GB guest...
>
> Any ideas?
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@xxxxxxxxxx For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>
> .
>


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/