Re: [PATCH] change global zonelist order on NUMA v2

From: Lee Schermerhorn
Date: Mon Apr 30 2007 - 10:11:00 EST


On Fri, 2007-04-27 at 09:27 +0900, KAMEZAWA Hiroyuki wrote:
> On Thu, 26 Apr 2007 08:48:19 -0700 (PDT)
> Christoph Lameter <clameter@xxxxxxx> wrote:
>
> > On Thu, 26 Apr 2007, KAMEZAWA Hiroyuki wrote:
> >
> > > (1)Use new zonelist ordering always and move init_task's tied cpu to a
> > > cpu on the best node.
> > > Child processes will start in good nodes even if Node 0 has small memory.
> >
> > How about renumbering the nodes? Node 0 is the one with no DMA memory and
> > node 1 may be the one with the DMA? That would take care of things even
> > without core modifications. We can start on node 0 (which hardware 1) and
> > consume the required memory for boot there not impacting the node with the
> > DMA memory.
> >
> It seems a bit complicated. If we do so, following can occur,
>
> Node1: cpu0,1,2,3
> Node0: cpu4,5,6,7
>
> the system layout will be not imaginable look, maybe.

Interesting. A colleague recently showed me that this can occur on HP
platforms if we boot from, say, node 1 instead of node 0. The kernel
doesn't mind because it maintains a translation of cpus to nodes and
vice versa. Applications don't need to mind if they use libnuma's
numa_node_to_cpus(), rather than assume a fixed relationship. But, I
agree, that it may surprise some people when/if node_id !=
cpu_id/cpus_per_node.

Lee

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/