Re: [PATCH 0/3] Use one zonelist per node instead of multiplezonelists v2

From: Lee Schermerhorn
Date: Wed Aug 08 2007 - 18:40:57 EST


On Wed, 2007-08-08 at 22:44 +0100, Mel Gorman wrote:
> On (08/08/07 14:30), Lee Schermerhorn didst pronounce:
> > On Wed, 2007-08-08 at 10:36 -0700, Christoph Lameter wrote:
> > > On Wed, 8 Aug 2007, Mel Gorman wrote:
> > >
<snip>
> > > > o Remove bind_zonelist() (Patch in progress, very messy right now)
> > >
> > > Will this also allow us to avoid always hitting the first node of an
> > > MPOL_BIND first?
> >
> > An idea:
> >
> > Apologies if someone already suggested this and I missed it. Too much
> > traffic...
> >
> > instead of passing a zonelist for BIND policy, how about passing [to
> > __alloc_pages(), I think] a starting node, a nodemask, and gfp flags for
> > zone and modifiers.
>
> Yes, this has come up before although it wasn't my initial suggestion. I
> thought maybe it was yours but I'm not sure anymore. I'm working through
> it at the moment.

I've heard/seen Christoph mention passing a nodemask to alloc_pages a
few times, but hadn't seen any of the details. Got me thinking..

> With the patch currently, a a nodemask is passed in for
> filtering which should be enough as the zonelist being used should be enough
> information to indicate the starting node.

It'll take me a while to absorb the patch, so I'll just ask: Where does
the zonelist for the argument come from? If the the bind policy
zonelist is removed, then does it come from a node? There'll be only
one per node with your other patches, right? So you had to have a node
id, to look up the zonelist? Do you need the zonelist elsewhere,
outside of alloc_pages()? If not, why not just let alloc_pages look it
up from a starting node [which I think can be determined from the
policy]?

OK, that's a lot of questions. no need to answer. That's just what I'm
thinking re: all this. I'll wait and see how the patch develops.

>
> The signature of __alloc_pages() becomes
>
> static page * fastcall
> __alloc_pages_nodemask(gfp_t gfp_mask, nodemask_t *nodemask,
> unsigned int order, struct zonelist *zonelist)
>
> > For various policies, the arguments would look like this:
> > Policy start node nodemask
> >
> > default local node cpuset_current_mems_allowed
> >
> > preferred preferred_node cpuset_current_mems_allowed
> >
> > interleave computed node cpuset_current_mems_allowed
> >
> > bind local node policy nodemask [replaces bind
> > zonelist in mempolicy]
> >
>
> The last one is the most interesting. Much of the patch in development
> involves deleting the custom node stuff. I've included the patch below if
> you're curious. I wanted to get one-zonelist out first to see if we could
> agree on that before going further with it.

Again, it'll be a while.

Thanks,
Lee


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/