Merged Zone / Node in order to do containers etc easily?

From: Christoph Lameter
Date: Mon Mar 05 2007 - 13:55:14 EST


We have talked a bit in the last days about eventually getting rid of
either nodes or zones.

If one would merge the nodes and the zones struct this would be possible.
Actually the current kernel supports something like it if the
following config options are not set

CONFIG_ZONE_DMA
CONFIG_ZONE_DMA32
CONFIG_HIGHMEM

In that case we only have a single zone per node but no support anymore
for DMA zones or highmem. We save the bits in the page->flags that are
usually used to identify the zone. For simplicities sake lets just call
these node / zone entities "zone".

Let say we have also CONFIG_NUMA set. Then

A. We could add more "zones" via node hotplug.
B. We can identify the zones via a node number from user space and direct
allocations to a specfic "zone".
C. We can migrate memory between "zones"
D. We have an indication how favorably these "zones" are to be used given
their SLIT distance.

Lets call these "zones" that were generated during bootup "base zones".

Now we need some additional functionality. In particular we want to be
able to put some memory dynamically into containers and we need to find a
replacement for the DMA zones.

Lets create a new type of zones called "derived zones". These are based on
base zone. An arbitrary number of MAX_ORDER blocks can be moved to these
and then they function like a regular "zone". They can be dynamically
created and deleted via the node hotplug interfaces.

So if we create a new container then we create a new zone and extract a
number of MAX_ORDER blocks from a base zone. The zone functions like a
base zone for the time that it exists and thus we have all the usual
accounting for the zone and do not need to add them separately. Reclaim
will work as for base zones etc etc. (this only works if we have MAX_ORDER
blocks available, thus we would need Mel's defrag patches). Applications
can be restricted to a container or containers by the cpuset
functionality. The build in process migration in cpusets can move
applications. Processes can be manually moved through page migration.

If we need some DMA zones for a particular device then we can also create
a new zone and extract pages in a certain range from the base zone. This
could occur dynamically (but early during boot so that the low end pages
in a zone have not been used yet) if we discover that devices exist that
need restricted memory pools. Moreover these zones could be custom sized
for the devices that are challenged in a particular way. For example we
could dynamically create a pool for a 2GB pool for the strange SCSI device
that can only reliable do DMA using a 31 bit address.

That leaves the HIGHMEM out cold so far but HIGHMEM is not needed on 64
bit platforms as far as I can tell. Maybe HIGHMEM could also be some sort
of derived zone with memory taken from the base zone used as the memmap
and as bounce buffers etc?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/