Re: [PATCH 4/7] ppc64 - Specify amount of kernel memory at boot time

From: Mel Gorman
Date: Fri Feb 24 2006 - 04:03:04 EST


On Thu, 23 Feb 2006, Dave Hansen wrote:

On Thu, 2006-02-23 at 18:01 +0000, Mel Gorman wrote:
On Thu, 23 Feb 2006, Dave Hansen wrote:
OK, back to the hapless system admin using kernelcore. They have a
4-node system with 2GB of RAM in each node for 8GB total. They use
kernelcore=1GB. They end up with 4x1GB ZONE_DMA and 4x1GB
ZONE_EASYRCLM. Perfect. You can safely remove 4GB of RAM.

Now, imagine that the machine has been heavily used for a while, there
is only 1 node's memory available, but CPUs are available in the same
places as before. So, you start up your partition again have 8GB of
memory in one node. Same kernelcore=1GB option. You get 1x7GB ZONE_DMA
and 1x1GB ZONE_EASYRCLM. I'd argue this is going to be a bit of a
surprise to the poor admin.


That sort of surprise is totally unacceptable but the behaviour of
kernelcore needs to be consistent on both the x86 and the ppc (any any
other ar. How about;

1. kernelcore=X determines the total amount of memory for !ZONE_EASYRCLM
(be it ZONE_DMA, ZONE_NORMAL or ZONE_HIGHMEM)

Sounds reasonable. But, if you're going to do that, should we just make
it the opposite and explicitly be easy_reclaim_mem=? Do we want the
limit to be set as "I need this much kernel memory", or "I want this
much removable memory". I dunno.


I think we should keep it at kernelcore=. If you have too little easyrclm memory, then hot-remove and hugetlb availability is impaired. If you have too little kernel memory, you have a really bad day.

2. For every node that can have ZONE_EASYRCLM, split the kernelcore across
the nodes as a percentage of the node size

Example: 4 nodes, 1 GiB each, kernelcore=512MB
node 0 ZONE_DMA = 128MB
node 1 ZONE_DMA = 128MB
node 2 ZONE_DMA = 128MB
node 3 ZONE_DMA = 128MB

2 nodes, 3GiB and 1GIB, kernelcore=512MB
node 0 ZONE_DMA = 384
node 1 ZONE_DMA = 128

It gets a bit more complex on NUMA for x86 because ZONE_NORMAL is
involved but the idea would essentially be the same.

Yes, chopping it up seems like the right thing (or as close as we can
get) to me.


Ok, will rework the code to make it happen.

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/