Re: [RFC PATCH V5] mm readahead: Fix readahead fail for no local memory and limit readahead pages

From: David Rientjes
Date: Thu Feb 13 2014 - 17:41:19 EST


On Thu, 13 Feb 2014, Raghavendra K T wrote:

> Thanks David, unfortunately even after applying that patch, I do not see
> the improvement.
>
> Interestingly numa_mem_id() seem to still return the value of a
> memoryless node.
> May be per cpu _numa_mem_ values are not set properly. Need to dig out ....
>

I believe ppc will be relying on __build_all_zonelists() to set
numa_mem_id() to be the proper node, and that relies on the ordering of
the zonelist built for the memoryless node. It would be very strange if
local_memory_node() is returning a memoryless node because it is the first
zone for node_zonelist(GFP_KERNEL) (why would a memoryless node be on the
zonelist at all?).

I think the real problem is that build_all_zonelists() is only called at
init when the boot cpu is online so it's only setting numa_mem_id()
properly for the boot cpu. Does it return a node with memory if you
toggle /proc/sys/vm/numa_zonelist_order? Do

echo node > /proc/sys/vm/numa_zonelist_order
echo zone > /proc/sys/vm/numa_zonelist_order
echo default > /proc/sys/vm/numa_zonelist_order

and check if it returns the proper value at either point. This will force
build_all_zonelists() and numa_mem_id() to point to the proper node since
all cpus are now online.

So the prerequisite for CONFIG_HAVE_MEMORYLESS_NODES is that there is an
arch-specific set_numa_mem() that makes this mapping correct like ia64
does. If that's the case, then it's (1) completely undocumented and (2)
Nishanth's patch is incomplete because anything that adds
CONFIG_HAVE_MEMORYLESS_NODES needs to do the proper set_numa_mem() for it
to be any different than numa_node_id().
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/