RE: [PATCH 1/3] arm64/numa: set numa_off to false when numa node is fake

From: Justin He
Date: Mon Jul 06 2020 - 08:48:07 EST


Hi Jonathan, thanks for the comments.

> -----Original Message-----
> From: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx>
> Sent: Monday, July 6, 2020 6:46 PM
> To: Justin He <Justin.He@xxxxxxx>
> Cc: Catalin Marinas <Catalin.Marinas@xxxxxxx>; Will Deacon
> <will@xxxxxxxxxx>; Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>; Mike
> Rapoport <rppt@xxxxxxxxxxxxx>; Baoquan He <bhe@xxxxxxxxxx>; Chuhong Yuan
> <hslester96@xxxxxxxxx>; linux-arm-kernel@xxxxxxxxxxxxxxxxxxx; linux-
> kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; Kaly Xin <Kaly.Xin@xxxxxxx>
> Subject: Re: [PATCH 1/3] arm64/numa: set numa_off to false when numa node
> is fake
>
> On Mon, 6 Jul 2020 11:29:21 +0100
> Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> wrote:
>
> > On Mon, 6 Jul 2020 09:19:45 +0800
> > Jia He <justin.he@xxxxxxx> wrote:
> >
> > Hi,
> >
> > > Previously, numa_off is set to true unconditionally in
> dummy_numa_init(),
> > > even if there is a fake numa node.
> > >
> > > But acpi will translate node id to NUMA_NO_NODE(-1) in
> acpi_map_pxm_to_node()
> > > because it regards numa_off as turning off the numa node.
> >
> > That is correct. It is operating exactly as it should, if SRAT hasn't
> been parsed
> > and you are on ACPI platform there are no nodes. They cannot be created
> at
> > some later date. The dummy code doesn't change this. It just does
> enough to carry
> > on operating with no specified nodes.
> >
> > >
> > > Without this patch, pmem can't be probed as a RAM device on arm64 if
> SRAT table
> > > isn't present.
> > >
> > > $ndctl create-namespace -fe namespace0.0 --mode=devdax --map=dev -s 1g
> -a 64K
> > > kmem dax0.0: rejecting DAX region [mem 0x240400000-0x2bfffffff] with
> invalid node: -1
> > > kmem: probe of dax0.0 failed with error -22
> > >
> > > This fixes it by setting numa_off to false.
> >
> > Without the SRAT protection patch [1] you may well run into problems

Sorry, doesn't quite understand here. Do you mean your [1] can resolve this
issue? But acpi_map_pxm_to_node() has returned with NUMA_NO_NODE after
following check:
if (pxm < 0 || pxm >= MAX_PXM_DOMAINS || numa_off)
return NUMA_NO_NODE;
Seems even with your [1] patch, it is not helpful? Thanks for clarification
if my understanding is wrong.
[1] https://patchwork.kernel.org/patch/11632063/

> > because someone somewhere will have _PXM in a DSDT but will
> > have a non existent SRAT. We had this happen on an AMD platform when
> we
> > tried to introduce working _PXM support for PCI. [2]
> >
> > So whilst this seems superficially safe, I'd definitely be crossing your
> fingers.
> > Note, at that time I proposed putting the numa_off = false into the x86
> code
> > path precisely to cut out that possibility (was rejected at the time, at
> least
> > partly because the clarifications to the ACPI spec were not pubilc.)
> >
> > The patch in [1] should sort things out however by ensuring we only
> create
> > new domains where we should actually be doing so. However, in your case
> > it will return NUMA_NO_NODE anyway so this isn't the right way to fix
> things.

Okay, let me try to summarize, there might be 3 possible fixing ways:
1. this patch, seems it is not satisfied by you and David ð
2. my previous proposal [2], similar as what David suggested
3. remove numa_off check in acpi_map_pxm_to_node()
e.g.
...
if (pxm < 0 || pxm >= MAX_PXM_DOMAINS /*|| numa_off*/)
return NUMA_NO_NODE;

[2] https://lkml.org/lkml/2019/8/16/367


--
Cheers,
Justin (Jia He)