Re: [PATCH V3 0/4] Define coherent device memory node

From: Anshuman Khandual
Date: Wed Mar 08 2017 - 07:13:23 EST


On 03/01/2017 04:29 PM, Balbir Singh wrote:
> On Wed, Mar 1, 2017 at 8:55 PM, Mel Gorman <mgorman@xxxxxxx> wrote:
>> On Wed, Mar 01, 2017 at 01:42:40PM +1100, Balbir Singh wrote:
>>>>>> The idea of this patchset was to introduce
>>>>>> the concept of memory that is not necessarily system memory, but is coherent
>>>>>> in terms of visibility/access with some restrictions
>>>>>>
>>>>> Which should be done without special casing the page allocator, cpusets and
>>>>> special casing how cpusets are handled. It's not necessary for any other
>>>>> mechanism used to restrict access to portions of memory such as cpusets,
>>>>> mempolicies or even memblock reservations.
>>>> Agreed, I mentioned a limitation that we see a cpusets. I do agree that
>>>> we should reuse any infrastructure we have, but cpusets are more static
>>>> in nature and inheritence compared to the requirements of CDM.
>>>>
>>> Mel, I went back and looked at cpusets and found some limitations that
>>> I mentioned earlier, isolating a particular node requires some amount
>>> of laborious work in terms of isolating all tasks away from the root cpuset
>>> and then creating a hierarchy where the root cpuset is empty and now
>>> belong to a child cpuset that has everything but the node we intend to
>>> ioslate. Even with hardwalling, it does not prevent allocations from
>>> the parent cpuset.
>>>
>> That it is difficult does not in itself justify adding a third mechanism
>> specific to one type of device for controlling access to memory.
>>
> Not only is it difficult, but there are several tasks that refuse to
> change cpusets once created. I also noticed that the isolation may
> begin a little too late, some allocations may end up on the node to
> isolate.
>
> I also want to eventually control whether auto-numa
> balancing/kswapd/reclaim etc run on this node (something that cpusets
> do not provide). The reason for these decisions is very dependent on
> the properties of the node. The isolation mechanism that exists today
> is insufficient. Moreover the correct abstraction for device memory
> would be a class similar to N_MEMORY, but limited in what we include
> (which is why I was asking if questions 3 and 4 are clear). You might
> argue these are not NUMA nodes then, but these are in general sense
> NUMA nodes (with non-uniform properties and access times). NUMA allows
> with the right hardware expose the right programming model. Please
> consider reading the full details at
>
> https://patchwork.kernel.org/patch/9566393/
> https://lkml.org/lkml/2016/11/22/339

As explained by Balbir, right now cpuset mechanism gives only isolation
and is insufficient for creating other properties required for full
fledged CDM representation. NUMA representation is the close match for
CDM memory which represents non uniform attributes instead of distance
as the only differentiating property. Once represented as a NUMA node
in the kernel, we can achieve the isolation requirement either through
buddy allocator changes as proposed in this series or can look into
some alternative approaches as well. As I had mentioned in the last
RFC there is another way to achieve isolation through zonelist rebuild
process changes and mbind() implementation changes. Please find those
two relevant commits here.

https://github.com/akhandual/linux/commit/da1093599db29c31d12422a34d4e0cbf4683618f
https://github.com/akhandual/linux/commit/faadab4e9dc9685ab7a564a84d4a06bde8fc79d8

Will post these commits on this thread for further discussion. Do let
me know your views and suggestions on this approach.