[RFC PATCH 5/6] device-dax: relax the memblock size alignment for kmem_start

From: Jia He
Date: Tue Jul 28 2020 - 23:35:55 EST


Previously, kmem_start in dev_dax_kmem_probe should be aligned with
SECTION_SIZE_BITS(30), i.e. 1G memblock size on arm64. Even with Dan
Williams' sub-section patch series, it was not helpful when adding the
dax pmem kmem to memblock:
$ndctl create-namespace -e namespace0.0 --mode=devdax --map=dev -s 2g -f -a 2M
$echo dax0.0 > /sys/bus/dax/drivers/device_dax/unbind
$echo dax0.0 > /sys/bus/dax/drivers/kmem/new_id
$cat /proc/iomem
...
23c000000-23fffffff : System RAM
23dd40000-23fecffff : reserved
23fed0000-23fffffff : reserved
240000000-33fdfffff : Persistent Memory
240000000-2403fffff : namespace0.0
280000000-2bfffffff : dax0.0 <- boundary are aligned with 1G
280000000-2bfffffff : System RAM (kmem)
$ lsmem
RANGE SIZE STATE REMOVABLE BLOCK
0x0000000040000000-0x000000023fffffff 8G online yes 1-8
0x0000000280000000-0x00000002bfffffff 1G online yes 10

Memory block size: 1G
Total online memory: 9G
Total offline memory: 0B
...
Hence there is a big gap between 0x2403fffff and 0x280000000 due to the 1G
alignment on arm64. More than that, only 1G memory is returned while 2G is
requested.

On x86, the gap is relatively small due to SECTION_SIZE_BITS(27).

Besides descreasing SECTION_SIZE_BITS on arm64, we can relax the alignment
when adding the kmem.
After this patch:
240000000-33fdfffff : Persistent Memory
240000000-2421fffff : namespace0.0
242400000-2bfffffff : dax0.0
242400000-2bfffffff : System RAM (kmem)
$ lsmem
RANGE SIZE STATE REMOVABLE BLOCK
0x0000000040000000-0x00000002bfffffff 10G online yes 1-10

Memory block size: 1G
Total online memory: 10G
Total offline memory: 0B

Notes, block 9-10 are the newly hotplug added.

This patches remove the tight alignment constraint of
memory_block_size_bytes(), but still keep the constraint from
online_pages_range().

Signed-off-by: Jia He <justin.he@xxxxxxx>
---
drivers/dax/kmem.c | 22 +++++++++++++---------
1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index d77786dc0d92..849d0706dfe0 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -30,9 +30,20 @@ int dev_dax_kmem_probe(struct device *dev)
const char *new_res_name;
int numa_node;
int rc;
+ int order;

- /* Hotplug starting at the beginning of the next block: */
- kmem_start = ALIGN(res->start, memory_block_size_bytes());
+ /* kmem_start needn't be aligned with memory_block_size_bytes().
+ * But given the constraint in online_pages_range(), adjust the
+ * alignment of kmem_start and kmem_size
+ */
+ kmem_size = resource_size(res);
+ order = min_t(int, MAX_ORDER - 1, get_order(kmem_size));
+ kmem_start = ALIGN(res->start, 1ul << (order + PAGE_SHIFT));
+ /* Adjust the size down to compensate for moving up kmem_start: */
+ kmem_size -= kmem_start - res->start;
+ /* Align the size down to cover only complete blocks: */
+ kmem_size &= ~((1ul << (order + PAGE_SHIFT)) - 1);
+ kmem_end = kmem_start + kmem_size;

/*
* Ensure good NUMA information for the persistent memory.
@@ -48,13 +59,6 @@ int dev_dax_kmem_probe(struct device *dev)
numa_node, res);
}

- kmem_size = resource_size(res);
- /* Adjust the size down to compensate for moving up kmem_start: */
- kmem_size -= kmem_start - res->start;
- /* Align the size down to cover only complete blocks: */
- kmem_size &= ~(memory_block_size_bytes() - 1);
- kmem_end = kmem_start + kmem_size;
-
new_res_name = kstrdup(dev_name(dev), GFP_KERNEL);
if (!new_res_name)
return -ENOMEM;
--
2.17.1