Re: [Patch 2/2]: powerpc/hotplug/mm: Fix hot-add memory node assoc

From: Michael Ellerman
Date: Mon May 29 2017 - 01:32:47 EST


Reza Arbab <arbab@xxxxxxxxxxxxxxxxxx> writes:

> On Fri, May 26, 2017 at 01:46:58PM +1000, Michael Ellerman wrote:
>>Reza Arbab <arbab@xxxxxxxxxxxxxxxxxx> writes:
>>
>>> On Thu, May 25, 2017 at 04:19:53PM +1000, Michael Ellerman wrote:
>>>>The commit message for 3af229f2071f says:
>>>>
>>>> In practice, we never see a system with 256 NUMA nodes, and in fact, we
>>>> do not support node hotplug on power in the first place, so the nodes
>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>> that are online when we come up are the nodes that will be present for
>>>> the lifetime of this kernel.
>>>>
>>>>Is that no longer true?
>>>
>>> I don't know what the reasoning behind that statement was at the time,
>>> but as far as I can tell, the only thing missing for node hotplug now is
>>> Balbir's patchset [1]. He fixes the resource issue which motivated
>>> 3af229f2071f and reverts it.
>>>
>>> With that set, I can instantiate a new numa node just by doing
>>> add_memory(nid, ...) where nid doesn't currently exist.
>>
>>But does that actually happen on any real system?
>
> I don't know if anything currently tries to do this. My interest in
> having this working is so that in the future, our coherent gpu memory
> could be added as a distinct node by the device driver.

Sure. If/when that happens, we would hopefully still have some way to
limit the size of the possible map.

That would ideally be a firmware property that tells us the maximum
number of GPUs that might be hot-added, or we punt and cap it at some
"sane" maximum number.

But until that happens it's silly to say we can have up to 256 nodes
when in practice most of our systems have 8 or less.

So I'm still waiting for an explanation from Michael B on how he's
seeing this bug in practice.

cheers