Re: [RFC v2] memory-hotplug: remove MIGRATE_ISOLATE fromfree_area->free_list

From: Minchan Kim
Date: Mon Sep 10 2012 - 20:50:49 EST


Hello Wen,

On Fri, Sep 07, 2012 at 03:28:22PM +0800, Wen Congyang wrote:
> At 09/06/2012 10:53 AM, Minchan Kim Wrote:
> > Normally, MIGRATE_ISOLATE type is used for memory-hotplug.
> > But it's irony type because the pages isolated would exist
> > as free page in free_area->free_list[MIGRATE_ISOLATE] so people
> > can think of it as allocatable pages but it is *never* allocatable.
> > It ends up confusing NR_FREE_PAGES vmstat so it would be
> > totally not accurate so some of place which depend on such vmstat
> > could reach wrong decision by the context.
> >
> > There were already report about it.[1]
> > [1] 702d1a6e, memory-hotplug: fix kswapd looping forever problem
> >
> > Then, there was other report which is other problem.[2]
> > [2] http://www.spinics.net/lists/linux-mm/msg41251.html
> >
> > I believe it can make problems in future, too.
> > So I hope removing such irony type by another design.
> >
> > I hope this patch solves it and let's revert [1] and doesn't need [2].
> >
> > * Changelog v1
> > * Fix from Michal's many suggestion
> >
> > Cc: Michal Nazarewicz <mina86@xxxxxxxxxx>
> > Cc: Mel Gorman <mel@xxxxxxxxx>
> > Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx>
> > Cc: Yasuaki Ishimatsu <isimatu.yasuaki@xxxxxxxxxxxxxx>
> > Cc: Wen Congyang <wency@xxxxxxxxxxxxxx>
> > Cc: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
> > Signed-off-by: Minchan Kim <minchan@xxxxxxxxxx>
> > ---
> > It's very early version which show the concept so I still marked it with RFC.
> > I just tested it with simple test and works.
> > This patch is needed indepth review from memory-hotplug guys from fujitsu
> > because I saw there are lots of patches recenlty they sent to about
> > memory-hotplug change. Please take a look at this patch.
> >
> > drivers/xen/balloon.c | 2 +
> > include/linux/mmzone.h | 4 +-
> > include/linux/page-isolation.h | 11 ++-
> > mm/internal.h | 3 +
> > mm/memory_hotplug.c | 38 ++++++----
> > mm/page_alloc.c | 33 ++++----
> > mm/page_isolation.c | 162 +++++++++++++++++++++++++++++++++-------
> > mm/vmstat.c | 1 -
> > 8 files changed, 193 insertions(+), 61 deletions(-)
> >
> > diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
> > index 31ab82f..df0f5f3 100644
> > --- a/drivers/xen/balloon.c
> > +++ b/drivers/xen/balloon.c
> > @@ -50,6 +50,7 @@
> > #include <linux/notifier.h>
> > #include <linux/memory.h>
> > #include <linux/memory_hotplug.h>
> > +#include <linux/page-isolation.h>
> >
> > #include <asm/page.h>
> > #include <asm/pgalloc.h>
> > @@ -268,6 +269,7 @@ static void xen_online_page(struct page *page)
> > else
> > --balloon_stats.balloon_hotplug;
> >
> > + delete_from_isolated_list(page);
> > mutex_unlock(&balloon_mutex);
> > }
> >
> > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > index 2daa54f..438bab8 100644
> > --- a/include/linux/mmzone.h
> > +++ b/include/linux/mmzone.h
> > @@ -57,8 +57,8 @@ enum {
> > */
> > MIGRATE_CMA,
> > #endif
> > - MIGRATE_ISOLATE, /* can't allocate from here */
> > - MIGRATE_TYPES
> > + MIGRATE_TYPES,
> > + MIGRATE_ISOLATE
> > };
> >
> > #ifdef CONFIG_CMA
> > diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
> > index 105077a..1ae2cd6 100644
> > --- a/include/linux/page-isolation.h
> > +++ b/include/linux/page-isolation.h
> > @@ -1,11 +1,16 @@
> > #ifndef __LINUX_PAGEISOLATION_H
> > #define __LINUX_PAGEISOLATION_H
> >
> > +extern struct list_head isolated_pages;
> >
> > bool has_unmovable_pages(struct zone *zone, struct page *page, int count);
> > void set_pageblock_migratetype(struct page *page, int migratetype);
> > int move_freepages_block(struct zone *zone, struct page *page,
> > int migratetype);
> > +
> > +void isolate_free_page(struct page *page, unsigned int order);
> > +void delete_from_isolated_list(struct page *page);
> > +
> > /*
> > * Changes migrate type in [start_pfn, end_pfn) to be MIGRATE_ISOLATE.
> > * If specified range includes migrate types other than MOVABLE or CMA,
> > @@ -20,9 +25,13 @@ start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
> > unsigned migratetype);
> >
> > /*
> > - * Changes MIGRATE_ISOLATE to MIGRATE_MOVABLE.
> > + * Changes MIGRATE_ISOLATE to @migratetype.
> > * target range is [start_pfn, end_pfn)
> > */
> > +void
> > +undo_isolate_pageblocks(unsigned long start_pfn, unsigned long end_pfn,
> > + unsigned migratetype);
> > +
> > int
> > undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
> > unsigned migratetype);
> > diff --git a/mm/internal.h b/mm/internal.h
> > index 3314f79..393197e 100644
> > --- a/mm/internal.h
> > +++ b/mm/internal.h
> > @@ -144,6 +144,9 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
> > * function for dealing with page's order in buddy system.
> > * zone->lock is already acquired when we use these.
> > * So, we don't need atomic page->flags operations here.
> > + *
> > + * Page order should be put on page->private because
> > + * memory-hotplug depends on it. Look mm/page_isolation.c.
> > */
> > static inline unsigned long page_order(struct page *page)
> > {
> > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > index 3ad25f9..30c36d5 100644
> > --- a/mm/memory_hotplug.c
> > +++ b/mm/memory_hotplug.c
> > @@ -410,26 +410,29 @@ void __online_page_set_limits(struct page *page)
> > unsigned long pfn = page_to_pfn(page);
> >
> > if (pfn >= num_physpages)
> > - num_physpages = pfn + 1;
> > + num_physpages = pfn + (1 << page_order(page));
> > }
> > EXPORT_SYMBOL_GPL(__online_page_set_limits);
> >
> > void __online_page_increment_counters(struct page *page)
> > {
> > - totalram_pages++;
> > + totalram_pages += (1 << page_order(page));
> >
> > #ifdef CONFIG_HIGHMEM
> > if (PageHighMem(page))
> > - totalhigh_pages++;
> > + totalhigh_pages += (1 << page_order(page));
> > #endif
> > }
> > EXPORT_SYMBOL_GPL(__online_page_increment_counters);
> >
> > void __online_page_free(struct page *page)
> > {
> > - ClearPageReserved(page);
> > - init_page_count(page);
> > - __free_page(page);
> > + int i;
> > + unsigned long order = page_order(page);
> > + for (i = 0; i < (1 << order); i++)
> > + ClearPageReserved(page + i);
> > + set_page_private(page, 0);
> > + __free_pages(page, order);
> > }
> > EXPORT_SYMBOL_GPL(__online_page_free);
> >
> > @@ -437,26 +440,29 @@ static void generic_online_page(struct page *page)
> > {
> > __online_page_set_limits(page);
> > __online_page_increment_counters(page);
> > + delete_from_isolated_list(page);
> > __online_page_free(page);
> > }
> >
> > static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
> > void *arg)
> > {
> > - unsigned long i;
> > + unsigned long pfn;
> > + unsigned long end_pfn = start_pfn + nr_pages;
> > unsigned long onlined_pages = *(unsigned long *)arg;
> > - struct page *page;
> > - if (PageReserved(pfn_to_page(start_pfn)))
> > - for (i = 0; i < nr_pages; i++) {
> > - page = pfn_to_page(start_pfn + i);
> > - (*online_page_callback)(page);
> > - onlined_pages++;
> > + struct page *cursor, *tmp;
> > + list_for_each_entry_safe(cursor, tmp, &isolated_pages, lru) {
> > + pfn = page_to_pfn(cursor);
> > + if (pfn >= start_pfn && pfn < end_pfn) {
> > + (*online_page_callback)(cursor);
> > + onlined_pages += (1 << page_order(cursor));
> > }
> > + }
> > +
>
> If the memory is hotpluged, the pages are not in isolated_pages, and they
> can't be onlined.

Hmm, I can't parse your point.
Could you elaborate it a bit?

>
> > *(unsigned long *)arg = onlined_pages;
> > return 0;
> > }
> >
> > -
> > int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
> > {
> > unsigned long onlined_pages = 0;
> > @@ -954,11 +960,11 @@ repeat:
> > goto failed_removal;
> > }
> > printk(KERN_INFO "Offlined Pages %ld\n", offlined_pages);
> > - /* Ok, all of our target is islaoted.
> > + /* Ok, all of our target is isolated.
> > We cannot do rollback at this point. */
> > offline_isolated_pages(start_pfn, end_pfn);
> > /* reset pagetype flags and makes migrate type to be MOVABLE */
> > - undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
> > + undo_isolate_pageblocks(start_pfn, end_pfn, MIGRATE_MOVABLE);
> > /* removal success */
> > zone->present_pages -= offlined_pages;
> > zone->zone_pgdat->node_present_pages -= offlined_pages;
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index ba3100a..3e516c5 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -721,6 +721,7 @@ static void __free_pages_ok(struct page *page, unsigned int order)
> > {
> > unsigned long flags;
> > int wasMlocked = __TestClearPageMlocked(page);
> > + int migratetype;
> >
> > if (!free_pages_prepare(page, order))
> > return;
> > @@ -729,8 +730,14 @@ static void __free_pages_ok(struct page *page, unsigned int order)
> > if (unlikely(wasMlocked))
> > free_page_mlock(page);
> > __count_vm_events(PGFREE, 1 << order);
> > - free_one_page(page_zone(page), page, order,
> > - get_pageblock_migratetype(page));
> > +
> > + migratetype = get_pageblock_migratetype(page);
> > + if (likely(migratetype != MIGRATE_ISOLATE))
> > + free_one_page(page_zone(page), page, order,
> > + migratetype);
> > + else
> > + isolate_free_page(page, order);
> > +
> > local_irq_restore(flags);
> > }
> >
> > @@ -906,7 +913,6 @@ static int fallbacks[MIGRATE_TYPES][4] = {
> > [MIGRATE_MOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
> > #endif
> > [MIGRATE_RESERVE] = { MIGRATE_RESERVE }, /* Never used */
> > - [MIGRATE_ISOLATE] = { MIGRATE_RESERVE }, /* Never used */
> > };
> >
> > /*
> > @@ -948,8 +954,13 @@ static int move_freepages(struct zone *zone,
> > }
> >
> > order = page_order(page);
> > - list_move(&page->lru,
> > - &zone->free_area[order].free_list[migratetype]);
> > + if (migratetype != MIGRATE_ISOLATE) {
> > + list_move(&page->lru,
> > + &zone->free_area[order].free_list[migratetype]);
> > + } else {
> > + list_del(&page->lru);
> > + isolate_free_page(page, order);
> > + }
> > page += 1 << order;
> > pages_moved += 1 << order;
> > }
> > @@ -1316,7 +1327,7 @@ void free_hot_cold_page(struct page *page, int cold)
> > */
> > if (migratetype >= MIGRATE_PCPTYPES) {
> > if (unlikely(migratetype == MIGRATE_ISOLATE)) {
> > - free_one_page(zone, page, 0, migratetype);
> > + isolate_free_page(page, 0);
> > goto out;
> > }
> > migratetype = MIGRATE_MOVABLE;
> > @@ -5908,7 +5919,6 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
> > struct zone *zone;
> > int order, i;
> > unsigned long pfn;
> > - unsigned long flags;
> > /* find the first valid pfn */
> > for (pfn = start_pfn; pfn < end_pfn; pfn++)
> > if (pfn_valid(pfn))
> > @@ -5916,7 +5926,6 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
> > if (pfn == end_pfn)
> > return;
> > zone = page_zone(pfn_to_page(pfn));
> > - spin_lock_irqsave(&zone->lock, flags);
> > pfn = start_pfn;
> > while (pfn < end_pfn) {
> > if (!pfn_valid(pfn)) {
> > @@ -5924,23 +5933,15 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
> > continue;
> > }
> > page = pfn_to_page(pfn);
> > - BUG_ON(page_count(page));
> > - BUG_ON(!PageBuddy(page));
> > order = page_order(page);
> > #ifdef CONFIG_DEBUG_VM
> > printk(KERN_INFO "remove from free list %lx %d %lx\n",
> > pfn, 1 << order, end_pfn);
> > #endif
> > - list_del(&page->lru);
> > - rmv_page_order(page);
> > - zone->free_area[order].nr_free--;
> > - __mod_zone_page_state(zone, NR_FREE_PAGES,
> > - - (1UL << order));
> > for (i = 0; i < (1 << order); i++)
> > SetPageReserved((page+i));
> > pfn += (1 << order);
> > }
> > - spin_unlock_irqrestore(&zone->lock, flags);
> > }
> > #endif
> >
> > diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> > index 247d1f1..27cf59e 100644
> > --- a/mm/page_isolation.c
> > +++ b/mm/page_isolation.c
> > @@ -8,6 +8,90 @@
> > #include <linux/memory.h>
> > #include "internal.h"
> >
> > +LIST_HEAD(isolated_pages);
> > +static DEFINE_SPINLOCK(lock);
> > +
> > +/*
> > + * Add the page into isolated_pages which is sort of pfn ascending list.
> > + */
> > +static void __add_isolated_page(struct page *page)
> > +{
> > + struct page *cursor;
> > + unsigned long pfn;
> > + unsigned long new_pfn = page_to_pfn(page);
> > +
> > + list_for_each_entry_reverse(cursor, &isolated_pages, lru) {
> > + pfn = page_to_pfn(cursor);
> > + if (pfn < new_pfn)
> > + break;
> > + }
> > +
> > + list_add(&page->lru, &cursor->lru);
> > +}
> > +
> > +/*
> > + * Isolate free page. It is used by memory-hotplug for stealing
> > + * free page from free_area or freeing path of allocator.
> > + */
> > +void isolate_free_page(struct page *page, unsigned int order)
> > +{
> > + unsigned long flags;
> > +
> > + /*
> > + * We increase refcount for further freeing when online_pages
> > + * happens and record order into @page->private so that
> > + * online_pages can know what order page freeing.
> > + */
> > + set_page_refcounted(page);
> > + set_page_private(page, order);
> > +
> > + /* move_freepages is alredy hold zone->lock */
> > + if (PageBuddy(page))
> > + __ClearPageBuddy(page);
> > +
> > + spin_lock_irqsave(&lock, flags);
> > + __add_isolated_page(page);
> > + spin_unlock_irqrestore(&lock, flags);
> > +}
> > +
> > +void delete_from_isolated_list(struct page *page)
> > +{
> > + unsigned long flags;
> > +
> > + spin_lock_irqsave(&lock, flags);
> > + list_del(&page->lru);
> > + spin_unlock_irqrestore(&lock, flags);
> > +}
> > +
> > +/* free pages in the pageblock which include @page */
> > +static void free_isolated_pageblock(struct page *page)
> > +{
> > + struct page *cursor, *tmp;
> > + unsigned long start_pfn, end_pfn, pfn;
> > + unsigned long flags;
> > + LIST_HEAD(pages);
> > +
> > + start_pfn = page_to_pfn(page);
> > + start_pfn = start_pfn & ~(pageblock_nr_pages-1);
> > + end_pfn = start_pfn + pageblock_nr_pages;
> > +
> > + spin_lock_irqsave(&lock, flags);
> > + list_for_each_entry_safe(cursor, tmp, &isolated_pages, lru) {
> > + pfn = page_to_pfn(cursor);
> > + if (pfn >= end_pfn)
> > + break;
> > + if (pfn >= start_pfn)
> > + list_move(&cursor->lru, &pages);
> > + }
> > + spin_unlock_irqrestore(&lock, flags);
> > +
> > + list_for_each_entry_safe(cursor, tmp, &pages, lru) {
> > + int order = page_order(cursor);
> > + list_del(&cursor->lru);
> > + __free_pages(cursor, order);
> > + }
> > +}
> > +
> > /* called while holding zone->lock */
> > static void set_pageblock_isolate(struct page *page)
> > {
> > @@ -91,13 +175,12 @@ void unset_migratetype_isolate(struct page *page, unsigned migratetype)
> > struct zone *zone;
> > unsigned long flags;
> > zone = page_zone(page);
> > +
> > spin_lock_irqsave(&zone->lock, flags);
> > - if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
> > - goto out;
> > - move_freepages_block(zone, page, migratetype);
> > - restore_pageblock_isolate(page, migratetype);
> > -out:
> > + if (get_pageblock_migratetype(page) == MIGRATE_ISOLATE)
> > + restore_pageblock_isolate(page, migratetype);
> > spin_unlock_irqrestore(&zone->lock, flags);
> > + free_isolated_pageblock(page);
> > }
> >
> > static inline struct page *
> > @@ -155,6 +238,30 @@ undo:
> > return -EBUSY;
> > }
> >
> > +void undo_isolate_pageblocks(unsigned long start_pfn, unsigned long end_pfn,
> > + unsigned migratetype)
> > +{
> > + unsigned long pfn;
> > + struct page *page;
> > + struct zone *zone;
> > + unsigned long flags;
> > +
> > + BUG_ON(start_pfn & (pageblock_nr_pages - 1));
> > + BUG_ON(end_pfn & (pageblock_nr_pages - 1));
> > +
> > + for (pfn = start_pfn;
> > + pfn < end_pfn;
> > + pfn += pageblock_nr_pages) {
> > + page = __first_valid_page(pfn, pageblock_nr_pages);
> > + if (!page || get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
> > + continue;
> > + zone = page_zone(page);
> > + spin_lock_irqsave(&zone->lock, flags);
> > + restore_pageblock_isolate(page, migratetype);
> > + spin_unlock_irqrestore(&zone->lock, flags);
> > + }
> > +}
> > +
> > /*
> > * Make isolated pages available again.
> > */
> > @@ -180,30 +287,35 @@ int undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
> > * all pages in [start_pfn...end_pfn) must be in the same zone.
> > * zone->lock must be held before call this.
> > *
> > - * Returns 1 if all pages in the range are isolated.
> > + * Returns true if all pages in the range are isolated.
> > */
> > -static int
> > -__test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn)
> > +static bool
> > +__test_page_isolated_in_pageblock(unsigned long start_pfn, unsigned long end_pfn)
>
> This function fails and the pages can't be offlined in my test. I will investigate
> it if I have time.
>
> Thanks
> Wen Congyang

Thanks for the testing, Wen.
I also want to take a look but not now due to other urgent task.
Shortly, I will revisit this issue, Thanks!
--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/