Re: [PATCH] Bias the location of pages freed for min_free_kbytes inthe same MAX_ORDER_NR_PAGES blocks

From: Mel Gorman
Date: Sun Mar 18 2007 - 15:06:06 EST


On Sun, 18 Mar 2007, Andrew Morton wrote:

On Sun, 18 Mar 2007 11:35:36 +0000 (GMT) Mel Gorman <mel@xxxxxxxxx> wrote:

But let me leap ahead of myself.

CONFIG_PAGE_GROUP_BY_MOBILITY

Why does this config item exist? It's not good to have some mysterious
knob which affects mm behaviour at compile time. We need to make up our
minds and stick with it.


The configuration item exists because there were concerns over the memory
footprint and cache line footprint. It was introduced to address that
concern and also so that it would be possible to compare the performance
behavior of anti-fragmentation. Your comment rang a bell though so I
searched the archives to see this comment from Andi Kleen;

===
If anything this should be a boot time option or perhaps sysctl, not a
config. In general CONFIGs that change runtime behaviour are evil - just
makes changing the option more painful, causes problems for distribution
users, doesn't make much sense, etc.etc.

Also #ifdef as a documentation device is a really really scary concept.
Yuck.
===

A sysctl would avoid any cache line footprint but not the memory overhead
because the freelists in struct zone as those freelists would still exist.
I could make the option depend on CONFIG_EMBEDDED for the zone overhead.
Would that make sense or would it be preferable to ditch the option
altogether?

I'll start looking at doing a sysctl so it can be disabled at runtime if
necessary. I strongly suspect that it cannot be enabled again once
disabled but I don't see that as a problem as such.

How much additional memory consumption are we expecting here?


Short answer, about 1.5KB on a 1GB system of which 1.3KB is statically defined in the 3 struct zones on a 1 node x86 system.

Longer answer that I hopefully have not made any mistakes in - There is the zone overhead which is statically sized and a runtime overhead which depends on the amount of memory in the system. The additional zone overhead is the overhead for additional freelists (larger struct free_area) and is as follows;

(MIGRATE_TYPES-1) * sizeof(list_head) * (MAX_ORDER-1)

so, on 32 bit in general, thats

4 * 8 * 10 = 320 bytes per zone (would be 240 bytes if MIGRATE_RESERVE is
sufficient for higher order allocations
instead of MIGRATE_HIGHALLOC)

on x86 with DMA, Normal and HighMem, thats 1280 bytes. On a NUMA system, it's 1280 bytes per node. On 64 bit, it would be double because of the larger pointer size. At worst, I guess you are looking at 3KB per node.

The size of the bitmap for the flags depends on the amount of memory. On FLATMEM and DISCONTIG, it's 3 bits per MAX_ORDER_NR_PAGES spanned by the zones (note spanned, not present). On a 1GiB system without holes on an x86 with MAX_ORDER of 11, that would be 96 bytes. The calculation is more complex for sparsemem but will be at least 4 bytes per active section for a pointer and another 4 bytes for most bitmaps. I have a patch that uses the pointer itself when the bitmap is small enough to be contained in 4 bytes (which is usually is) so it could be just 4 bytes per active section (4 bytes per 16MB on powerpc for example).

Whether it's runtime or compile-time, the optionality is not good.


The main situation where I think it would be desirable to disable anti-fragmentation is for zones that are smaller than MIGRATE_TYPES*MAX_ORDER_NR_PAGES and this is for performance reasons to avoid regularly entering __rmqueue_fallback(). However, that situation could be automatically detected and handled by having allocflags_to_migratetype() always return MIGRATE_UNMOVABLE for small zones and similar for get_pageblock_migratetype(). There would be no need for a sysctl.

If a sysctl was available for cases where profiles showed a problem with the bitmap for example, it would work by having allocflags_to_migratetype() and get_pageblock_migratetype() always return MIGRATE_UNMOVABLE so it would not be totally different code paths that are being executed in the enabled/disabled cases.

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/