Re: [PATCH 07/10] bootmem: add free_bootmem_late
From: Ingo Molnar
Date: Wed Oct 28 2009 - 03:49:19 EST
* FUJITA Tomonori <fujita.tomonori@xxxxxxxxxxxxx> wrote:
> From: Chris Wright <chrisw@xxxxxxxxxxxx>
>
> Add a new function for freeing bootmem after the bootmem allocator has
> been released and the unreserved pages given to the page allocator.
> This allows us to reserve bootmem and then release it if we later
> discover it was not needed.
>
> Reviewed-by: FUJITA Tomonori <fujita.tomonori@xxxxxxxxxxxxx>
> Signed-off-by: Chris Wright <chrisw@xxxxxxxxxxxx>
> ---
> include/linux/bootmem.h | 1 +
> mm/bootmem.c | 43 ++++++++++++++++++++++++++++++++++++++-----
> 2 files changed, 39 insertions(+), 5 deletions(-)
Hm, we are now further complicating the bootmem model.
I think we could remove the bootmem allocator middle man altogether.
This can be done by initializing the page allocator sooner and by
extending (already existing) 'reserve memory early on' mechanisms in
architecture code. (the reserve_early*() APIs in x86 for example)
Right now we have 5 memory allocation models on x86, initialized
gradually:
- allocator (buddy) [generic]
- early allocator (bootmem) [generic]
- very early allocator (reserve_early*()) [x86]
- very very early allocator (early brk model) [x86]
- very very very early allocator (build time .data/.bss) [generic]
Seems excessive.
The reserve_early() method is list/range based and can handle vast
amounts of not very fragmented memory - perfect for basically all the
real bootmem purposes (which is to bootstrap the buddy).
reserve_early() allocated memory could be freed into the buddy later on
as well. The main reason why bootmem is 'destroyed' during free-to-buddy
is because it has excessive internal bitmaps we want to free. With a
list/range based reserve_early() mechanism there's no such problem -
they can linger indefinitely and there's near zero allocation management
overhead.
reserve_early() might need some small amount of extra work before it can
be used as a generic early allocator - like adding a node field to it
(so that the buddy can then pick those ranges up in a NUMA aware
fashion) - but nothing very complex.
Thoughts?
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/