Re: memory hotremove prototype, take 3

From: IWAMOTO Toshihiro
Date: Thu Dec 04 2003 - 10:44:58 EST


At Wed, 03 Dec 2003 21:38:54 -0800,
Martin J. Bligh <mbligh@xxxxxxxxxxx> wrote:
> > My target is somewhat NUMA-ish and fairly large. So I'm not sure if
> > CONFIG_NONLINEAR fits, but CONFIG_NUMA isn't perfect either.
>
> If your target is NUMA, then you really, really need CONFIG_NONLINEAR.
> We don't support multiple pgdats per node, nor do I wish to, as it'll
> make an unholy mess ;-). With CONFIG_NONLINEAR, the discontiguities
> within a node are buried down further, so we have much less complexity
> to deal with from the main VM. The abstraction also keeps the poor
> VM engineers trying to read / write the code saner via simplicity ;-)

IIRC, memory is contiguous within a NUMA node. I think Goto-san will
clarify this issue when his code gets ready. :-)

> WRT generic discontigmem support (not NUMA), doing that via pgdats
> should really go away, as there's no real difference between the
> chunks of physical memory as far as the page allocator is concerned.
> The plan is to use Daniel's nonlinear stuff to replace that, and keep
> the pgdats strictly for NUMA. Same would apply to hotpluggable zones -
> I'd hate to end up with 512 pgdats of stuff that are really all the
> same memory types underneath.

Yes. Unnecessary zone rebalancing would suck.

> The real issue you have is the mapping of the struct pages - if we can
> acheive a non-contig mapping of the mem_map / lmem_map array, we should
> be able to take memory on and offline reasonably easy. If you're willing
> for a first implementation to pre-allocate the struct page array for
> every possible virtual address, it makes life a lot easier.

Preallocating struct page array isn't feasible for the target system
because max memory / min memory ratio is large.
Our plan is to use the beginning (or the end) of the memory block being
hotplugged. If a 2GB memory block is added, first ~20MB is used for
the struct page array for the rest of the memory block.


> >> PS. What's this bit of the patch for?
> >>
> >> void *vmalloc(unsigned long size)
> >> {
> >> +#ifdef CONFIG_MEMHOTPLUGTEST
> >> + return __vmalloc(size, GFP_KERNEL, PAGE_KERNEL);
> >> +#else
> >> return __vmalloc(size, GFP_KERNEL | __GFP_HIGHMEM, PAGE_KERNEL);
> >> +#endif
> >> }
> >
> > This is necessary because kernel memory cannot be swapped out.
> > Only highmem can be hot removed, though it doesn't need to be highmem.
> > We can define another zone attribute such as GFP_HOTPLUGGABLE.
>
> You could just lock the pages, I'd think? I don't see at a glance
> exactly what you were using this for, but would that work?

I haven't seriously considered to implement vmalloc'd memory, but I
guess that would be too complicated if not impossible.
Making kernel threads or interrupt handlers block on memory access
sound very difficult to me.

--
IWAMOTO Toshihiro
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/