Re: [PATCH] allocate page caches pages in round robin fasion

From: Nick Piggin
Date: Fri Aug 13 2004 - 20:22:46 EST


Martin J. Bligh wrote:
Well, either we're:

1. Falling back and putting all our most recent accesses off-node.

or.

2. Not falling back and only able to use one node's memory for any one (single threaded) app.

Either situation is crap, though I'm not sure which turd we picked right
now ... I'd have to look at the code again ;-) I thought it was 2, but
I might be wrong.


I'm looking at this now. We are doing 1 currently.


In theory, yes. In practice, I have a feeling kswapd will keep us above
the level of free memory where we'd fall back to another zone to allocate,
won't it?

Nope. Take a look at the first loop-through-the-zones in alloc_pages
(preferably in akpm's tree that is cleaned up a bit).

We go through *all* zones first and allocate them down to pages_low
before kicking kswapd.

I have tried kicking kswapd before going off node, but it frees memory
really aggressively - so you're nearly left with a local alloc policy.


There are a couple of issues. The first is that you need to minimise
regressions for when working set size is bigger than the local node.


Good point ... that is, indeed, a total bitch to fix.


At the end of the day we'll possibly just have to have a sysctl. I
don't think all regressions could be eliminated completely. We'll
see.


I have a patch going now that just reclaims use-once file cache before
going off node. Seems to help a bit for basic things that just push
pagecache through the system. It definitely reduces remote allocations
by several orders of magnitude for those cases.


Makes sense, but doesn't the same thing make sense on a global basis?
I don't feel NUMA is anything magical here ...


Didn't parse that. If you mean the transition from highmem->normal->dma
zones, I don't think that should be treated in this way. Imagine small
highmem zones for example. We have the lower zone protection in place
for that case - and that is something that in turn isn't good for NUMA,
because the SGI guys (I think) already ran into that and fixed it to be
per-node only.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/