[Patch:004/004] wait_table and zonelist initializing for memory hotadd (update zonelists)

From: Yasunori Goto
Date: Wed Apr 05 2006 - 07:03:10 EST


In current code, zonelist is considered to be build once, no modification.
But MemoryHotplug can add new zone/pgdat. It must be updated.

This patch modifies build_all_zonelists().
By this, build_all_zonelist() can reconfig pgdat's zonelists.

To update them safety, this patch use stop_machine_run().
Other cpus don't touch among updating them by using it.

In previous version (V2), kernel updated them after zone initialization.
But present_page of its new zone is still 0, because online_page()
is not called yet at this time.
Build_zonelists() checks present_pages to find present zone.
It was too early. So, I changed it after online_pages().

Signed-off-by: Yasunori Goto <y-goto@xxxxxxxxxxxxxx>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx>

mm/memory_hotplug.c | 12 ++++++++++++
mm/page_alloc.c | 26 +++++++++++++++++++++-----
2 files changed, 33 insertions(+), 5 deletions(-)

Index: pgdat10/mm/page_alloc.c
===================================================================
--- pgdat10.orig/mm/page_alloc.c 2006-04-04 20:42:51.000000000 +0900
+++ pgdat10/mm/page_alloc.c 2006-04-04 20:42:52.000000000 +0900
@@ -37,6 +37,7 @@
#include <linux/nodemask.h>
#include <linux/vmalloc.h>
#include <linux/mempolicy.h>
+#include <linux/stop_machine.h>

#include <asm/tlbflush.h>
#include "internal.h"
@@ -1762,14 +1763,29 @@ static void __init build_zonelists(pg_da

#endif /* CONFIG_NUMA */

-void __init build_all_zonelists(void)
+/* return values int ....just for stop_machine_run() */
+static int __meminit __build_all_zonelists(void *dummy)
{
- int i;
+ int nid;
+ for_each_online_node(nid)
+ build_zonelists(NODE_DATA(nid));
+ return 0;
+}
+
+void __meminit build_all_zonelists(void)
+{
+ if (system_state == SYSTEM_BOOTING) {
+ __build_all_zonelists(0);
+ cpuset_init_current_mems_allowed();
+ } else {
+ /* we have to stop all cpus to guaranntee there is no user
+ of zonelist */
+ stop_machine_run(__build_all_zonelists, NULL, NR_CPUS);
+ /* cpuset refresh routine should be here */
+ }

- for_each_online_node(i)
- build_zonelists(NODE_DATA(i));
printk("Built %i zonelists\n", num_online_nodes());
- cpuset_init_current_mems_allowed();
+
}

/*
Index: pgdat10/mm/memory_hotplug.c
===================================================================
--- pgdat10.orig/mm/memory_hotplug.c 2006-04-04 20:42:49.000000000 +0900
+++ pgdat10/mm/memory_hotplug.c 2006-04-04 20:42:52.000000000 +0900
@@ -123,6 +123,7 @@ int online_pages(unsigned long pfn, unsi
unsigned long flags;
unsigned long onlined_pages = 0;
struct zone *zone;
+ int need_zonelists_rebuild = 0;

/*
* This doesn't need a lock to do pfn_to_page().
@@ -135,6 +136,14 @@ int online_pages(unsigned long pfn, unsi
grow_pgdat_span(zone->zone_pgdat, pfn, pfn + nr_pages);
pgdat_resize_unlock(zone->zone_pgdat, &flags);

+ /*
+ * If this zone is not populated, then it is not in zonelist.
+ * This means the page allocator ignores this zone.
+ * So, zonelist must be updated after online.
+ */
+ if (!populated_zone(zone))
+ need_zonelists_rebuild = 1;
+
for (i = 0; i < nr_pages; i++) {
struct page *page = pfn_to_page(pfn + i);
online_page(page);
@@ -145,5 +154,8 @@ int online_pages(unsigned long pfn, unsi

setup_per_zone_pages_min();

+ if (need_zonelists_rebuild)
+ build_all_zonelists();
+
return 0;
}

--
Yasunori Goto


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/