Re: [RFC 0/2] Delay initializing of large sections of memory

From: Yinghai Lu
Date: Fri Jun 21 2013 - 15:10:56 EST


On Fri, Jun 21, 2013 at 11:50 AM, Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> On Fri, Jun 21, 2013 at 11:44:22AM -0700, Yinghai Lu wrote:
>> On Fri, Jun 21, 2013 at 10:03 AM, H. Peter Anvin <hpa@xxxxxxxxx> wrote:
>> > On 06/21/2013 09:51 AM, Greg KH wrote:
>> >
>> > I suspect the cutoff for this should be a lot lower than 8 TB even, more
>> > like 128 GB or so. The only concern is to not set the cutoff so low
>> > that we can end up running out of memory or with suboptimal NUMA
>> > placement just because of this.
>>
>> I would suggest another way:
>> only boot the system with boot node (include cpu, ram and pci root buses).
>> then after boot, could add other nodes.
>
> What exactly do you mean by "after boot"? Often, the boot process of
> userspace needs those additional cpus and ram in order to initialize
> everything (like the pci devices) properly.

I mean for Intel cpu have cpu and memory controller and IIO.
every IIO is one peer pci root bus.
So scan root bus that are not with boot node later.

in this way we can keep all numa etc on the place when online ram, cpu, pci...

For example if we have 32 sockets system, most time for boot is with *BIOS*
instead of OS. In those kind of system boot is like this way:
only first two sockets get booted from bios to OS.
later use hot add every other two sockets.

that will also make BIOS simpler, and it need to support hot-add for
services purpose anyway.

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/