Re: Fw: NUMA allocator on Opteron systems does non-local allocation on node0

From: Oliver Weihe
Date: Fri Oct 17 2008 - 04:08:49 EST


Hi,

this problem/question is allready solved for me. Andi suggested to post
this on the linux-mm mailing list and they helped me. :)

> > I've notived that the memory allocation on NUMA systems (Opterons)
> > does
> > memory allocation on non-local nodes for processes running node0
> > even if
> > local memory is available. (Kernel 2.6.25 and above)
>
> How much local memory is available? 8GB per node? That means there
> will be 4GB
> on node 0 in ZONE_DMA32 and 4GB in ZONE_NORMAL. Other nodes will have
> 8GB in
> ZONE_NORMAL.

You're right. This machine has 8GiB per node. Due to the memory hole the
machine has ~3GiB ZONE_DMA32 which perfectly fits to my observations.


> > In my setup I'm allocating an array of ~7GiB memory size in a
> > singlethreaded application.
> > Startup: numactl --cpunodebind=X ./app
> > For X=1,2,3 it works as expected, all memory is allocated on the
> > local
> > node.
> > For X=0 I can see the memory beeing allocated on node0 as long as
> > ~3GiB
> > are "free" on node0. At this point the kernel starts using memory
> > from
> > node1 for the app!
>
> NUMA only supports memory policies for the highest zone which is
> ZONE_NORMAL here. Only 4GB of ZONE_NORMAL are available on node 0, so
> it will
> go off node after that memory is exhausted. This is done in order to
> preserve
> the lower 4GB for I/O to 32 bit devices.

I've changed the policy from "default" to "node"
(/proc/sys/vm/numa_zonelist_order) and now it works fine for me.
Policy "default" does automaticly select "node" or "zone" depending on
the machine. When the policy is set to "default" the kernel (2.6.27)
chooses "node" if
1. there is no ZONE_DMA32
2. the size of ZONE_DMA32 is greater than 50% of the system memory
3. the size of ZONE_DMA32 is greater than 60% of the nodelocal memory


--

Regards,
Oliver Weihe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/