Re: [Lhms-devel] [PATCH 0/7] Fragmentation Avoidance V19

From: Dave Hansen
Date: Tue Nov 01 2005 - 10:23:24 EST


On Tue, 2005-11-01 at 16:01 +0100, Ingo Molnar wrote:
> so it's all about expectations: _could_ you reasonably remove a piece of
> RAM? Customer will say: "I have stopped all nonessential services, and
> free RAM is at 90%, still I cannot remove that piece of faulty RAM, fix
> the kernel!".

That's an excellent example. Until we have some kind of kernel
remapping, breaking the 1:1 kernel virtual mapping, these pages will
always exist. The easiest example of this kind of memory is kernel
text.

Another example might be a somewhat errant device driver which has
allocates some large buffers and is doing DMA to or from them. In this
case, we need to have APIs to require devices to give up and reacquire
any dynamically allocated structures. If the device driver does not
implement these APIs it is not compatible with memory hotplug.

> > There is also no precedent in existing UNIXes for a 100% solution.
>
> does this have any relevance to the point, other than to prove that it's
> a hard problem that we should not pretend to be able to solve, without
> seeing a clear path towards a solution?

Agreed. It is a hard problem. One that some other UNIXes have not
fully solved.

Here are the steps that I think we need to take. Do you see any holes
in their coverage? Anything that seems infeasible?

1. Fragmentation avoidance
* by itself, increases likelyhood of having an area of memory
which might be easily removed
* very small (if any) performance overhead
* other potential in-kernel users
* creates infrastructure to enforce the "hotplugablity" of any
particular are of memory.
2. Driver APIs
* Require that drivers specifically request for areas which must
retain constant physical addresses
* Driver must relinquish control of such areas upon request
* Can be worked around by hypervisors
3. Break 1:1 Kernel Virtual/Physial Mapping
* In any large area of physical memory we wish to remove, there will
likely be very, very few straggler pages, which can not easily be
freed.
* Kernel will transparently move the contents of these physical pages
to new pages, keeping constant virtual addresses.
* Negative TLB overhead, as in-kernel large page mappings are broken
down into smaller pages.
* __{p,v}a() become more expensive, likely a table lookup

I've already done (3) on a limited basis, in the early days of memory
hotplug. Not the remapping, just breaking the 1:1 assumptions. It
wasn't too horribly painful.

We'll also need to make some decisions along the way about what to do
about thinks like large pages. Is it better to just punt like AIX and
refuse to remove their areas? Break them down into small pages and
degrade performance?

-- Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/