Re: [PATCH v3 00/10] EFI Specific Purpose Memory Support

From: Dan Williams
Date: Fri Jun 07 2019 - 16:41:37 EST


On Fri, Jun 7, 2019 at 12:57 PM Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
>
> On 6/7/19 12:27 PM, Dan Williams wrote:
> > In support of optionally allowing either application-exclusive and
> > core-kernel-mm managed access to differentiated memory, claim
> > EFI_MEMORY_SP ranges for exposure as device-dax instances by default.
> > Such instances can be directly owned / mapped by a
> > platform-topology-aware application. Alternatively, with the new kmem
> > facility [4], the administrator has the option to instead designate that
> > those memory ranges be hot-added to the core-kernel-mm as a unique
> > memory numa-node. In short, allow for the decision about what software
> > agent manages specific-purpose memory to be made at runtime.
>
> It's probably worth noting that the reason the memory lands into the
> state of being controlled by device-dax by default is that device-dax is
> nice. It's actually willing and able to give up ownership of the memory
> when we ask. If we added to the core-mm, we'd almost certainly not be
> able to get it back reliably.
>
> Anyway, thanks for doing these, and I really hope that the world's
> BIOSes actually use this flag.

It should be noted that the flag is necessary, but not sufficient to
route this memory range to device-dax. The BIOS must also publish ACPI
HMAT performance data for the range so the OS has a chance of knowing
*why* the memory is "reserved for a specific purpose", and delineate
the boundaries of multiple performance differentiated memory ranges
that might be combined into one shared / contiguous EFI memory
descriptor.

With no HMAT the memory will be reserved, but no dax-device will be
surfaced. Perhaps this implementation also needs a WARN_TAINT(...,
TAINT_FIRMWARE_WORKAROUND...) to scream about a BIOS that fails to
publish the required HMAT entries, or perhaps even better a command
line option to ignore the flag so that the core-mm can pick up the
memory by default?