Re: [PATCH 00/35] Cleanup and optimise the page allocator V3

From: Mel Gorman
Date: Mon Mar 16 2009 - 08:02:34 EST

Next message: Andreas Mohr: "Re: [alsa-devel] Bugs on aspire one A150"
Previous message: Balazs Scheidler: "Re: [patch] Re: scheduler oddity [bug?]"
In reply to: Nick Piggin: "Re: [PATCH 00/35] Cleanup and optimise the page allocator V3"
Next in thread: Nick Piggin: "Re: [PATCH 00/35] Cleanup and optimise the page allocator V3"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Mon, Mar 16, 2009 at 12:33:58PM +0100, Nick Piggin wrote:
> On Mon, Mar 16, 2009 at 11:19:06AM +0000, Mel Gorman wrote:
> > On Mon, Mar 16, 2009 at 11:40:54AM +0100, Nick Piggin wrote:
> > > That's wonderful, but it would
> > > significantly increase the fragmentation problem, wouldn't it?
> >
> > Not necessarily, anti-fragmentation groups movable pages within a
> > hugepage-aligned block and high-order allocations will trigger a merge of
> > buddies from PAGE_ALLOC_MERGE_ORDER (defined in the relevant patch) up to
> > MAX_ORDER-1. Critically, a merge is also triggered when anti-fragmentation
> > wants to fallback to another migratetype to satisfy an allocation. As
> > long as the grouping works, it doesn't matter if they were only merged up
> > to PAGE_ALLOC_MERGE_ORDER as a full merge will still free up hugepages.
> > So two slow paths are made slower but the fast path should be faster and it
> > should be causing fewer cache line bounces due to writes to struct page.
>
> Oh, but the anti-fragmentation stuff is orthogonal to this. Movable
> groups should always be defragmentable (at some cost)... the bane of
> anti-frag is fragmentation of the non-movable groups.
>

True, the reclaimable area has varying degrees of success and the
non-movable groups are almost unworkable and depend on how much of them
depend on pagetables.

> And one reason why buddy is so good at avoiding fragmentation is
> because it will pick up _any_ pages that go past the allocator if
> they have any free buddies. And it hands out ones that don't have
> free buddies. So in that way it is naturally continually filtering
> out pages which can be merged.
>
> Wheras if you defer this until the point you need a higher order
> page, the only thing you have to work with are the pages that are
> free *right now*.
>

Well, buddy always uses the smallest available page first. Even with
deferred coalescing, it will merge up to order-5 at least. Lets say they
could have merged up to order-10 in ordinary circumstances, they are
still avoided for as long as possible. Granted, it might mean that an
order-5 is split that could have been merged but it's hard to tell how
much of a difference that makes.

> It will almost definitely increase fragmentation of non movable zones,
> and if you have a workload doing non-trivial, non movable higher order
> allocations that are likely to cause fragmentation, it will result
> in these allocations eating movable groups sooner I think.
>

I think the effect will be same unless the non-movable high-order
allocations are order-5 or higher in which case we are likely going to
hit trouble anyway.

>
> > When I last checked (about 10 days) ago, I hadn't damaged anti-fragmentation
> > but that was a lot of revisions ago. I'm redoing the tests to make sure
> > anti-fragmentation is still ok.
>
> Your anti-frag tests probably don't stress this long term fragmentation
> problem.
>

Probably not, but we have little data on long-term fragmentation other than
anecdotal evidence that it's ok these days.

> Still, it's significant enough that I think it should be made
> optional (and arguably default to on) even if it does harm higher
> order allocations a bit.
>

I could make PAGE_ORDER_MERGE_ORDER a proc tunable? If it's placed as a
read-mostly variable beside the gfp_zone table, it might even fit in the
same cache line.

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Andreas Mohr: "Re: [alsa-devel] Bugs on aspire one A150"
Previous message: Balazs Scheidler: "Re: [patch] Re: scheduler oddity [bug?]"
In reply to: Nick Piggin: "Re: [PATCH 00/35] Cleanup and optimise the page allocator V3"
Next in thread: Nick Piggin: "Re: [PATCH 00/35] Cleanup and optimise the page allocator V3"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]