Re: [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup

From: Andrew Morton
Date: Thu May 07 2015 - 18:10:00 EST


On Thu, 7 May 2015 08:25:18 +0100 Mel Gorman <mgorman@xxxxxxx> wrote:

> Waiman Long reported that 24TB machines hit OOM during basic setup when
> struct page initialisation was deferred. One approach is to initialise memory
> on demand but it interferes with page allocator paths. This patch creates
> dedicated threads to initialise memory before basic setup. It then blocks
> on a rw_semaphore until completion as a wait_queue and counter is overkill.
> This may be slower to boot but it's simplier overall and also gets rid of a
> section mangling which existed so kswapd could do the initialisation.

Seems a reasonable compromise. It makes a bit of a mess of the patch
sequencing.

Have some tweaklets:



From: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Subject: mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup-fix

include rwsem.h, use DECLARE_RWSEM, fix comment, remove unneeded cast

Cc: Daniel J Blueman <daniel@xxxxxxxxxxxxx>
Cc: Dave Hansen <dave.hansen@xxxxxxxxx>
Cc: Mel Gorman <mgorman@xxxxxxx>
Cc: Nathan Zimmer <nzimmer@xxxxxxx>
Cc: Scott Norton <scott.norton@xxxxxx>
Cc: Waiman Long <waiman.long@xxxxxx
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

mm/page_alloc.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff -puN mm/page_alloc.c~mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup-fix mm/page_alloc.c
--- a/mm/page_alloc.c~mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup-fix
+++ a/mm/page_alloc.c
@@ -18,6 +18,7 @@
#include <linux/mm.h>
#include <linux/swap.h>
#include <linux/interrupt.h>
+#include <linux/rwsem.h>
#include <linux/pagemap.h>
#include <linux/jiffies.h>
#include <linux/bootmem.h>
@@ -1075,12 +1076,12 @@ static void __init deferred_free_range(s
__free_pages_boot_core(page, pfn, 0);
}

-static struct rw_semaphore __initdata pgdat_init_rwsem;
+static __initdata DECLARE_RWSEM(pgdat_init_rwsem);

/* Initialise remaining memory on a node */
static int __init deferred_init_memmap(void *data)
{
- pg_data_t *pgdat = (pg_data_t *)data;
+ pg_data_t *pgdat = data;
int nid = pgdat->node_id;
struct mminit_pfnnid_cache nid_init_state = { };
unsigned long start = jiffies;
@@ -1096,7 +1097,7 @@ static int __init deferred_init_memmap(v
return 0;
}

- /* Bound memory initialisation to a local node if possible */
+ /* Bind memory initialisation thread to a local node if possible */
if (!cpumask_empty(cpumask))
set_cpus_allowed_ptr(current, cpumask);

@@ -1200,7 +1201,6 @@ void __init page_alloc_init_late(void)
{
int nid;

- init_rwsem(&pgdat_init_rwsem);
for_each_node_state(nid, N_MEMORY) {
down_read(&pgdat_init_rwsem);
kthread_run(deferred_init_memmap, NODE_DATA(nid), "pgdatinit%d", nid);
_

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/