Darrick J. Wong wrote:
[add tytso to cc since he asked about "How do you actually /get/ fsdax
mode these days?" this morning]
On Tue, Oct 25, 2022 at 10:56:19AM -0700, Darrick J. Wong wrote:
On Tue, Oct 25, 2022 at 02:26:50PM +0000, ruansy.fnst@xxxxxxxxxxx wrote:
Nope. Since the announcement of pmem as a product, I have had 15
minutes of acces to one preproduction prototype server with actual
optane DIMMs in them.
I have /never/ had access to real hardware to test any of this, so it's
all configured via libvirt to simulate pmem in qemu:
https://lore.kernel.org/linux-xfs/YzXsavOWMSuwTBEC@magnolia/
/run/mtrdisk/[gh].mem are both regular files on a tmpfs filesystem:
$ grep mtrdisk /proc/mounts
none /run/mtrdisk tmpfs rw,relatime,size=82894848k,inode64 0 0
$ ls -la /run/mtrdisk/[gh].mem
-rw-r--r-- 1 libvirt-qemu kvm 10739515392 Oct 24 18:09 /run/mtrdisk/g.mem
-rw-r--r-- 1 libvirt-qemu kvm 10739515392 Oct 24 19:28 /run/mtrdisk/h.mem
Also forgot to mention that the VM with the fake pmem attached has a
script to do:
ndctl create-namespace --mode fsdax --map dev -e namespace0.0 -f
ndctl create-namespace --mode fsdax --map dev -e namespace1.0 -f
Every time the pmem device gets recreated, because apparently that's the
only way to get S_DAX mode nowadays?
If you have noticed a change here it is due to VM configuration not
anything in the driver.
If you are interested there are two ways to get pmem declared the legacy
way that predates any of the DAX work, the kernel calls it E820_PRAM,
and the modern way by platform firmware tables like ACPI NFIT. The
assumption with E820_PRAM is that it is dealing with battery backed
NVDIMMs of small capacity. In that case the /dev/pmem device can support
DAX operation by default because the necessary memory for the 'struct
page' array for that memory is likely small.
Platform firmware defined PMEM can be terabytes. So the driver does not
enable DAX by default because the user needs to make policy choice about
burning gigabytes of DRAM for that metadata, or placing it in PMEM which
is abundant, but slower. So what I suspect might be happening is your
configuration changed from something that auto-allocated the 'struct
page' array, to something that needed those commands you list above to
explicitly opt-in to reserving some PMEM capacity for the page metadata.