Re: 2.7 thoughts

From: William Lee Irwin III
Date: Fri Oct 10 2003 - 05:49:23 EST


At some point in the past, I wrote:
>> arena. The two issues mentioned above are in reality non-issues.

On Fri, Oct 10, 2003 at 07:26:26PM +0900, YoshiyaETO wrote:
> Could tell me what is the real issue you think?

Using reserved areas for kernel metadata marked as ZONE_HIGHMEM creates
two serious problems. First memory placement of kernel metadata is
disturbed, just like the "all kernel memory references are on node 0"
problem on ia32 NUMA. Second, it also inherits all of the workload
feasibility problems also seen on ia32 PAE. A third, less serious
problem is that the confusions between notions such as the capability
to do io to memory vs. placement in ZONE_HIGHMEM, the likely newly
introduced confusion between being able to directly access the data
in ZONE_HIGHMEM without establishing a new mapping (TLB entry) vs.
copying overheads etc., and the placement of some pinned/wired kernel
metadata like leaf pagetable nodes in ZONE_HIGHMEM, are all nasty
details conveniently left out of the quickly rattled-off zoning schemes
for inhibiting pinned/wired allocations from offlineable memory.

The first point explodes the entire theory of nodes coming on and off
line; the moment node-local kernel data is allowed, the entire zoning
scheme falls apart and the node can't be fully offlined at will as
envisioned at all.

The second point raises the further question of whether allowing the
zoning to get anywhere near 64-bit is even desirable; Linux tends to
want to consume all physical memory for kernel metadata as it pleases,
and confining it to small subsets of it has proved incompatible with
its design and/or the wishes of the kernel community in the past. The
general notions of process-local paged kernel metadata and the like
which would create some allowance for relocatable kernel metadata (or
in the PAE case, reduce pressure on lowmem) have been proposed before
and met heavy resistance. OTOH, certain things, like mem_map[] and
pgdats, could easily be made node-local as they would be torn down
when the node is offlined anyway.


-- wli

(The answers to the issues raised in the third point are all basically
API cleanups and the implementation of a thing that should have been in
the kernel to begin with.)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/