Re: [PATCH 01/10] mm: control memory placement by nodemask for two tier main memory

From: Dan Williams
Date: Sat Mar 23 2019 - 13:21:46 EST


On Fri, Mar 22, 2019 at 9:45 PM Yang Shi <yang.shi@xxxxxxxxxxxxxxxxx> wrote:
>
> When running applications on the machine with NVDIMM as NUMA node, the
> memory allocation may end up on NVDIMM node. This may result in silent
> performance degradation and regression due to the difference of hardware
> property.
>
> DRAM first should be obeyed to prevent from surprising regression. Any
> non-DRAM nodes should be excluded from default allocation. Use nodemask
> to control the memory placement. Introduce def_alloc_nodemask which has
> DRAM nodes set only. Any non-DRAM allocation should be specified by
> NUMA policy explicitly.
>
> In the future we may be able to extract the memory charasteristics from
> HMAT or other source to build up the default allocation nodemask.
> However, just distinguish DRAM and PMEM (non-DRAM) nodes by SRAT flag
> for the time being.
>
> Signed-off-by: Yang Shi <yang.shi@xxxxxxxxxxxxxxxxx>
> ---
> arch/x86/mm/numa.c | 1 +
> drivers/acpi/numa.c | 8 ++++++++
> include/linux/mmzone.h | 3 +++
> mm/page_alloc.c | 18 ++++++++++++++++--
> 4 files changed, 28 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
> index dfb6c4d..d9e0ca4 100644
> --- a/arch/x86/mm/numa.c
> +++ b/arch/x86/mm/numa.c
> @@ -626,6 +626,7 @@ static int __init numa_init(int (*init_func)(void))
> nodes_clear(numa_nodes_parsed);
> nodes_clear(node_possible_map);
> nodes_clear(node_online_map);
> + nodes_clear(def_alloc_nodemask);
> memset(&numa_meminfo, 0, sizeof(numa_meminfo));
> WARN_ON(memblock_set_node(0, ULLONG_MAX, &memblock.memory,
> MAX_NUMNODES));
> diff --git a/drivers/acpi/numa.c b/drivers/acpi/numa.c
> index 867f6e3..79dfedf 100644
> --- a/drivers/acpi/numa.c
> +++ b/drivers/acpi/numa.c
> @@ -296,6 +296,14 @@ void __init acpi_numa_slit_init(struct acpi_table_slit *slit)
> goto out_err_bad_srat;
> }
>
> + /*
> + * Non volatile memory is excluded from zonelist by default.
> + * Only regular DRAM nodes are set in default allocation node
> + * mask.
> + */
> + if (!(ma->flags & ACPI_SRAT_MEM_NON_VOLATILE))
> + node_set(node, def_alloc_nodemask);

Hmm, no, I don't think we should do this. Especially considering
current generation NVDIMMs are energy backed DRAM there is no
performance difference that should be assumed by the non-volatile
flag.

Why isn't default SLIT distance sufficient for ensuring a DRAM-first
default policy?