Re: [PATCH] mm: skip the page buddy block instead of one page

From: Minchan Kim
Date: Wed Aug 14 2013 - 12:39:33 EST


On Wed, Aug 14, 2013 at 05:16:42PM +0100, Mel Gorman wrote:
> On Thu, Aug 15, 2013 at 12:52:29AM +0900, Minchan Kim wrote:
> > Hi Mel,
> >
> > On Wed, Aug 14, 2013 at 09:57:11AM +0100, Mel Gorman wrote:
> > > On Wed, Aug 14, 2013 at 12:45:41PM +0800, Xishi Qiu wrote:
> > > > A large free page buddy block will continue many times, so if the page
> > > > is free, skip the whole page buddy block instead of one page.
> > > >
> > > > Signed-off-by: Xishi Qiu <qiuxishi@xxxxxxxxxx>
> > >
> > > page_order cannot be used unless zone->lock is held which is not held in
> > > this path. Acquiring the lock would prevent parallel allocations from the
> >
> > Argh, I missed that. And it seems you already pointed it out long time ago
> > someone try to do same things if I remember correctly. :(
>
> It feels familiar but I do not remember why.
>
> > But let's think about it more.
> >
> > It's always not right because CMA and memory-hotplug already isolated
> > free pages in the range to MIGRATE_ISOLATE right before starting migration
> > so we could use page_order safely in those contexts even if we don't hold
> > zone->lock.
> >
>
> Both of those are teh corner cases. Neither operation happen frequently
> in comparison to something like THP allocations for example. I think an
> optimisation along those lines is marginal at best.

In embedded side, we don't use THP yet but uses CMA and memory-hotplug so
your claim isn't the case for the embedded world.
And as I said, CMA area is last fallback so it's likely to have many free
pages so bigger CMA area is, the bigger the benefit(CPU and Power) is,
I guess.

>
> > In addition, it's likely to have many free pages in case of CMA because CMA
> > makes MIGRATE_CMA fallback of MIGRATE_MOVABLE to minimize number of migrations.
> > Even CMA area was full, it could have many free pages once driver who is
> > CMA area's owner releases the CMA area. So, the bigger CMA space is,
> > the bigger patch's benefit is. And it could help memory-hotplug, too.
> >
> > Only problem is normal compaction. The worst case is just skipping
> > pageblock_nr_pages, for instace, 4M(of course, it depends on configuration).
> > but we can make the race window very small by dobule checking PageBuddy.
> > Still, it could make the race theoretically but I think it's really really
> > unlikely and still enhance compaction overhead withtout holding the lock.
> > Even if the race happens, normal compaction's customers(ex, THP) doesn't
> > have critical result and just fallback. So I think it isn't not bad tradeoff.
> >
> > diff --git a/mm/compaction.c b/mm/compaction.c
> > index 05ccb4c..2341d52 100644
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -520,8 +520,18 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
> > goto next_pageblock;
> >
> > /* Skip if free */
> > - if (PageBuddy(page))
> > + if (PageBuddy(page)) {
> > + /*
> > + * page_order is racy without zone->lock but worst case
> > + * by the racing is just skipping pageblock_nr_pages.
> > + * but even the race is really unlikely by double
> > + * check of PageBuddy.
> > + */
> > + unsigned long order = page_order(page);
> > + if (PageBuddy(page))
> > + low_pfn += (1 << order) - 1;
> > continue;
> > + }
> >
>
> Even if the page is still page buddy, there is no guarantee that it's
> the same page order as the first read. It could have be currently
> merging with adjacent buddies for example. There is also a really
> small race that a page was freed, allocated with some number stuffed
> into page->private and freed again before the second PageBuddy check.
> It's a bit of a hand grenade. How much of a performance benefit is there

1. Just worst case is skipping pageblock_nr_pages
2. Race is really small
3. Higher order page allocation customer always have graceful fallback.

If you really have a concern about that, we can add following.