Re: [PATCH v3 0/3] create sysfs representation of ACPI HMAT

From: Anshuman Khandual
Date: Thu Dec 21 2017 - 22:10:12 EST


On 12/14/2017 07:40 AM, Ross Zwisler wrote:
> ==== Quick Summary ====
>
> Platforms exist today which have multiple types of memory attached to a
> single CPU. These disparate memory ranges have some characteristics in
> common, such as CPU cache coherence, but they can have wide ranges of
> performance both in terms of latency and bandwidth.

Right.

>
> For example, consider a system that contains persistent memory, standard
> DDR memory and High Bandwidth Memory (HBM), all attached to the same CPU.
> There could potentially be an order of magnitude or more difference in
> performance between the slowest and fastest memory attached to that CPU.

Right.

>
> With the current Linux code NUMA nodes are CPU-centric, so all the memory
> attached to a given CPU will be lumped into the same NUMA node. This makes
> it very difficult for userspace applications to understand the performance
> of different memory ranges on a given CPU.

Right but that might require fundamental changes to the NUMA representation.
Plugging those memory as separate NUMA nodes, identify them through sysfs
and try allocating from it through mbind() seems like a short term solution.

Though if we decide to go in this direction, sysfs interface or something
similar is required to enumerate memory properties.

>
> We solve this issue by providing userspace with performance information on
> individual memory ranges. This performance information is exposed via
> sysfs:
>
> # grep . mem_tgt2/* mem_tgt2/local_init/* 2>/dev/null
> mem_tgt2/firmware_id:1
> mem_tgt2/is_cached:0
> mem_tgt2/local_init/read_bw_MBps:40960
> mem_tgt2/local_init/read_lat_nsec:50
> mem_tgt2/local_init/write_bw_MBps:40960
> mem_tgt2/local_init/write_lat_nsec:50

I might have missed discussions from earlier versions, why we have this
kind of a "source --> target" model ? We will enlist properties for all
possible "source --> target" on the system ? Right now it shows only
bandwidth and latency properties, can it accommodate other properties
as well in future ?

>
> This allows applications to easily find the memory that they want to use.
> We expect that the existing NUMA APIs will be enhanced to use this new
> information so that applications can continue to use them to select their
> desired memory.

I had presented a proposal for NUMA redesign in the Plumbers Conference this
year where various memory devices with different kind of memory attributes
can be represented in the kernel and be used explicitly from the user space.
Here is the link to the proposal if you feel interested. The proposal is
very intrusive and also I dont have a RFC for it yet for discussion here.

https://linuxplumbersconf.org/2017/ocw//system/presentations/4656/original/Hierarchical_NUMA_Design_Plumbers_2017.pdf

Problem is, designing the sysfs interface for memory attribute detection
from user space without first thinking about redesigning the NUMA for
heterogeneous memory may not be a good idea. Will look into this further.