Re: [PATCH 0/13] Parallel struct page initialisation v4

From: Andrew Morton
Date: Mon May 04 2015 - 17:30:58 EST


On Fri, 01 May 2015 20:09:21 -0400 Waiman Long <waiman.long@xxxxxx> wrote:

> On 05/01/2015 06:02 PM, Waiman Long wrote:
> >
> > Bad news!
> >
> > I tried your patch on a 24-TB DragonHawk and got an out of memory
> > panic. The kernel log messages were:
>
> ...
>
> > [ 81.360287] [<ffffffff8151b0c9>] dump_stack+0x68/0x77
> > [ 81.365942] [<ffffffff8151ae1e>] panic+0xb9/0x219
> > [ 81.371213] [<ffffffff810785c3>] ?
> > __blocking_notifier_call_chain+0x63/0x80
> > [ 81.378971] [<ffffffff811384ce>] __out_of_memory+0x34e/0x350
> > [ 81.385292] [<ffffffff811385ee>] out_of_memory+0x5e/0x90
> > [ 81.391230] [<ffffffff8113ce9e>] __alloc_pages_slowpath+0x6be/0x740
> > [ 81.398219] [<ffffffff8113d15c>] __alloc_pages_nodemask+0x23c/0x250
> > [ 81.405212] [<ffffffff81186346>] kmem_getpages+0x56/0x110
> > [ 81.411246] [<ffffffff81187f44>] fallback_alloc+0x164/0x200
> > [ 81.417474] [<ffffffff81187cfd>] ____cache_alloc_node+0x8d/0x170
> > [ 81.424179] [<ffffffff811887bb>] kmem_cache_alloc_trace+0x17b/0x240
> > [ 81.431169] [<ffffffff813d5f3a>] init_memory_block+0x3a/0x110
> > [ 81.437586] [<ffffffff81b5f687>] memory_dev_init+0xd7/0x13d
> > [ 81.443810] [<ffffffff81b5f2af>] driver_init+0x2f/0x37
> > [ 81.449556] [<ffffffff81b1599b>] do_basic_setup+0x29/0xd5
> > [ 81.455597] [<ffffffff81b372c4>] ? sched_init_smp+0x140/0x147
> > [ 81.462015] [<ffffffff81b15c55>] kernel_init_freeable+0x20e/0x297
> > [ 81.468815] [<ffffffff81512ea0>] ? rest_init+0x80/0x80
> > [ 81.474565] [<ffffffff81512ea9>] kernel_init+0x9/0xf0
> > [ 81.480216] [<ffffffff8151f788>] ret_from_fork+0x58/0x90
> > [ 81.486156] [<ffffffff81512ea0>] ? rest_init+0x80/0x80
> > [ 81.492350] ---[ end Kernel panic - not syncing: Out of memory and
> > no killable processes...
> > [ 81.492350]
> >
> > -Longman
>
> I increased the pre-initialized memory per node in update_defer_init()
> of mm/page_alloc.c from 2G to 4G. Now I am able to boot the 24-TB
> machine without error. The 12-TB has 0.75TB/node, while the 24-TB
> machine has 1.5TB/node. I would suggest something like pre-initializing
> 1G per 0.25TB/node. In this way, it will scale properly with the memory
> size.

We're using more than 2G before we've even completed do_basic_setup()?
Where did it all go?

> Before the patch, the boot time from elilo prompt to ssh login was 694s.
> After the patch, the boot up time was 346s, a saving of 348s (about 50%).

Having to guesstimate the amount of memory which is needed for a
successful boot will be painful. Any number we choose will be wrong
99% of the time.

If the kswapd threads have started, all we need to do is to wait: take
a little nap in the allocator's page==NULL slowpath.

I'm not seeing any reason why we can't start kswapd much earlier -
right at the start of do_basic_setup()?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/