Re: [Xen-devel] NUMA_BALANCING and Xen PV guest regression in 3.20-rc0

From: Dario Faggioli
Date: Mon Feb 23 2015 - 10:23:45 EST


Hi everyone,

On Thu, 2015-02-19 at 17:01 +0000, Mel Gorman wrote:
> On Thu, Feb 19, 2015 at 01:06:53PM +0000, David Vrabel wrote:

> I cannot think of a reason why this would fail for NUMA balancing on bare
> metal. The PAGE_NONE protection clears the present bit on p[te|md]_modify
> so the expectations are matched before or after the patch is applied. So,
> for bare metal at least
>
> Acked-by: Mel Gorman <mgorman@xxxxxxx>
>
> I *think* this will work ok with Xen but I cannot 100% convince myself.
> I'm adding Wei Liu to the cc who may have a Xen PV setup handy that
> supports NUMA and may be able to test the patch to confirm.
>
I'm not Wei, but I've been able to test a kernel with David's patch in
the following conditions:

1. as Dom0 kernel, when Xen does not have any virtual NUMA support
2. as DomU PV kernel, when Xen does not have any virtual NUMA support
3. as DomU PV kernel, when Xen _does_ _have_ virtual NUMA support
(i.e., Wei's code)

Cases 1. and 2. have been, I believe, tested by David already, but
anyways... :-)

Case 3. worked well for me, as the following commands show. In fact,
with this in guest config file:

vnuma = [ [ "pnode=0","size=1000","vcpus=0-3","vdistances=10,20" ],
[ "pnode=1","size=1000","vcpus=4-7","vdistances=20,10" ],
]

This is what I get from inside the guest:

root@test-pv:~# numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3
node 0 size: 951 MB
node 0 free: 868 MB
node 1 cpus: 4 5 6 7
node 1 size: 968 MB
node 1 free: 924 MB
node distances:
node 0 1
0: 10 20
1: 20 10

And this is it from the host:

root@Zhaman:~# xl debug-keys u ; xl dmesg |tail -12
(XEN) Memory location of each domain:
(XEN) Domain 0 (total: 1047417):
(XEN) Node 0: 1031009
(XEN) Node 1: 16408
(XEN) Domain 1 (total: 512000):
(XEN) Node 0: 256000
(XEN) Node 1: 256000
(XEN) 2 vnodes, 8 vcpus, guest physical layout:
(XEN) 0: pnode 0, vcpus 0-3
(XEN) 0000000000000000 - 000000003e800000
(XEN) 1: pnode 1, vcpus 4-7
(XEN) 000000003e800000 - 000000007d000000


Still inside the guest, I see this:

root@test-pv:~# cat /proc/sys/kernel/numa_balancing
1

And this:

root@test-pv:~# grep numa /proc/vmstat
numa_hit 65987
numa_miss 0
numa_foreign 0
numa_interleave 14473
numa_local 58642
numa_other 7345
numa_pte_updates 596
numa_huge_pte_updates 0
numa_hint_faults 479
numa_hint_faults_local 420
numa_pages_migrated 51

So, yes, I would say this wok with Xen, is that correct, Mel?

I'll give it a try at running more complex stuff like 'perf bench numa'
inside the guest and see what happens...

Regards,
Dario

Attachment: signature.asc
Description: This is a digitally signed message part