Re: [RFC V2 03/12] mm: Change generic FALLBACK zonelist creation process

From: Anshuman Khandual
Date: Wed Feb 01 2017 - 01:46:34 EST


On 01/31/2017 12:55 PM, John Hubbard wrote:
> On 01/30/2017 05:57 PM, Dave Hansen wrote:
>> On 01/30/2017 05:36 PM, Anshuman Khandual wrote:
>>>> Let's say we had a CDM node with 100x more RAM than the rest of the
>>>> system and it was just as fast as the rest of the RAM. Would we still
>>>> want it isolated like this? Or would we want a different policy?
>>>
>>> But then the other argument being, dont we want to keep this 100X more
>>> memory isolated for some special purpose to be utilized by specific
>>> applications ?
>>
>> I was thinking that in this case, we wouldn't even want to bother with
>> having "system RAM" in the fallback lists. A device who got its memory
>> usage off by 1% could start to starve the rest of the system. A sane
>> policy in this case might be to isolate the "system RAM" from the
>> device's.
>
> I also don't like having these policies hard-coded, and your 100x
> example above helps clarify what can go wrong about it. It would be
> nicer if, instead, we could better express the "distance" between nodes
> (bandwidth, latency, relative to sysmem, perhaps), and let the NUMA
> system figure out the Right Thing To Do.
>
> I realize that this is not quite possible with NUMA just yet, but I
> wonder if that's a reasonable direction to go with this?

That is complete overhaul of the NUMA representation in the kernel. What
CDM attempts is to find a solution with existing NUMA framework and with
as little code change as possible.