Re: [Lhms-devel] [PATCH 0/7] Fragmentation Avoidance V19

From: Martin J. Bligh
Date: Fri Nov 04 2005 - 10:40:01 EST


>> just to make sure i didnt get it wrong, wouldnt we get most of the
>> benefits Andy is seeking by having a: boot-time option which sets aside
>> a "hugetlb zone", with an additional sysctl to grow (or shrink) the pool
>> - with the growing happening on a best-effort basis, without guarantees?
>
> Boot-time option to set the hugetlb zone, yes.
>
> Grow-or-shrink, probably not. Not in practice after bootup on any machine
> that is less than idle.
>
> The zones have to be pretty big to make any sense. You don't just grow
> them or shrink them - they'd be on the order of tens of megabytes to
> gigabytes. In other words, sized big enough that you will _not_ be able to
> create them on demand, except perhaps right after boot.
>
> Growing these things later simply isn't reasonable. I can pretty much
> guarantee that any kernel I maintain will never have dynamic kernel
> pointers: when some memory has been allocated with kmalloc() (or
> equivalent routines - pretty much _any_ kernel allocation), it stays put.
> Which means that if there is a _single_ kernel alloc in such a zone, it
> won't ever be then usable for hugetlb stuff.
>
> And I don't want excessive complexity. We can have things like "turn off
> kernel allocations from this zone", and then wait a day or two, and hope
> that there aren't long-term allocs. It might even work occasionally. But
> the fact is, a number of kernel allocations _are_ long-term (superblocks,
> root dentries, "struct thread_struct" for long-running user daemons), and
> it's simply not going to work well in practice unless you have set aside
> the "no kernel alloc" zone pretty early on.

Exactly. But that's what all the anti-fragmentation stuff was about - trying
to pack unfreeable stuff together.

I don't think anyone is proposing dynamic kernel pointers inside Linux,
except in that we could possibly change the P-V mapping underneath from
the hypervisor, so that the phys address would change, but you wouldn't
see it. Trouble is, that's mostly done on a larger-than-page size
granularity, so we need SOME larger chunk to switch out (preferably at
least a large-paged size, so we can continue to use large TLB entries for
the kernel mapping).

However, the statically sized option is hugely problematic too.

M.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/