Re: [RFC 1/2] x86_64, mm: Delay initializing large portion ofmemory

From: Rob Landley
Date: Tue Jun 25 2013 - 00:14:55 EST


On 06/21/2013 11:25:33 AM, Nathan Zimmer wrote:
On a 16TB system it can takes upwards of two hours to boot the system with
about 60% of the time being spent initializing memory. This patch delays
initializing a large portion of memory until after the system is booted.
This can significantly reduce the time it takes the boot the system down
to the 15 to 30 minute range.

Why is this conditional? Initialize the minimum amount of memory to bring up each NUMA node, and then have each processor initialize its own memory. I would have thought it was already doing this...


+ delay_mem_init=B:M:n:l:h
+ This delays the initialization of a large portion of
+ memory by inserting it into the "absent" memory list.
+ This allows the system to boot up much faster at the
+ expense of the time needed to add this absent memory
+ after the system has booted. That however can be done
+ in parallel with other operations.

This seems like a giant advertisement primarily aimed at repeating why you think we need to merge the patch, not explaining what it is or how to use it.

I would rephrase:

Defer memory initialization until after SMP init (so
large memory ranges can be initialized in parallel) by
moving memory not needed during boot to the "absent" list.

And I repeat: why do we need to micromanage this? It sounds like all NUMA systems should do something like this. (Single-threaded memory initialization in an SMP system is kind of weird.)

+ Format: B:M:n:l:h
+ (1 << B) is the block size (bsize)
+ ['0' indicates use the default 128M]
+ (1 << M) is the address space per node
+ (n * bsize) is minimum sized node memory to slice
+ (l * bisze) is low memory to leave on node
+ (h * bisze) is high memory to leave on node

I don't understand this in the slightest. I understand "low memory to leave on the node", I have no idea why there are four other parameters.


+config DELAY_MEM_INIT
+ bool "Delay memory initialization"
+ depends on EFI && MEMORY_HOTPLUG_SPARSE
+ ---help---
+ This option delays initializing a large portion of memory
+ until after the system is booted. This can significantly
+ reduce the time it takes the boot the system when there
+ is a significant amount of memory present. Systems with
+ 8TB or more of memory benefit the most.

I can see an SMP phone wanting to use this to shave a quarter second off its boot time. Your "large portion of memory" description is a bit myopic.

Rob--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/