On Thu, 2006-02-23 at 18:01 +0000, Mel Gorman wrote:On Thu, 23 Feb 2006, Dave Hansen wrote:OK, back to the hapless system admin using kernelcore. They have a
4-node system with 2GB of RAM in each node for 8GB total. They use
kernelcore=1GB. They end up with 4x1GB ZONE_DMA and 4x1GB
ZONE_EASYRCLM. Perfect. You can safely remove 4GB of RAM.
Now, imagine that the machine has been heavily used for a while, there
is only 1 node's memory available, but CPUs are available in the same
places as before. So, you start up your partition again have 8GB of
memory in one node. Same kernelcore=1GB option. You get 1x7GB ZONE_DMA
and 1x1GB ZONE_EASYRCLM. I'd argue this is going to be a bit of a
surprise to the poor admin.
That sort of surprise is totally unacceptable but the behaviour of
kernelcore needs to be consistent on both the x86 and the ppc (any any
other ar. How about;
1. kernelcore=X determines the total amount of memory for !ZONE_EASYRCLM
(be it ZONE_DMA, ZONE_NORMAL or ZONE_HIGHMEM)
Sounds reasonable. But, if you're going to do that, should we just make
it the opposite and explicitly be easy_reclaim_mem=? Do we want the
limit to be set as "I need this much kernel memory", or "I want this
much removable memory". I dunno.
2. For every node that can have ZONE_EASYRCLM, split the kernelcore across
the nodes as a percentage of the node size
Example: 4 nodes, 1 GiB each, kernelcore=512MB
node 0 ZONE_DMA = 128MB
node 1 ZONE_DMA = 128MB
node 2 ZONE_DMA = 128MB
node 3 ZONE_DMA = 128MB
2 nodes, 3GiB and 1GIB, kernelcore=512MB
node 0 ZONE_DMA = 384
node 1 ZONE_DMA = 128
It gets a bit more complex on NUMA for x86 because ZONE_NORMAL is
involved but the idea would essentially be the same.
Yes, chopping it up seems like the right thing (or as close as we can
get) to me.