Re: [RFC] arm64: mm: update max_pfn after memory hotplug

From: David Hildenbrand
Date: Fri Sep 24 2021 - 04:17:57 EST


On 24.09.21 04:47, Florian Fainelli wrote:


On 9/23/2021 3:54 PM, Chris Goldsworthy wrote:
From: Sudarshan Rajagopalan <quic_sudaraja@xxxxxxxxxxx>

After new memory blocks have been hotplugged, max_pfn and max_low_pfn
needs updating to reflect on new PFNs being hot added to system.

Signed-off-by: Sudarshan Rajagopalan <quic_sudaraja@xxxxxxxxxxx>
Signed-off-by: Chris Goldsworthy <quic_cgoldswo@xxxxxxxxxxx>
---
arch/arm64/mm/mmu.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index cfd9deb..fd85b51 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1499,6 +1499,11 @@ int arch_add_memory(int nid, u64 start, u64 size,
if (ret)
__remove_pgd_mapping(swapper_pg_dir,
__phys_to_virt(start), size);
+ else {
+ max_pfn = PFN_UP(start + size);
+ max_low_pfn = max_pfn;
+ }

This is a drive by review, but it got me thinking about your changes a bit:

- if you raise max_pfn when you hotplug memory, don't you need to lower
it when you hot unplug memory as well?

The issue with lowering is that you actually have to do some search to figure out the actual value -- and it's not really worth the trouble. Raising the limit is easy.

With memory hotunplug, anybody wanting to take a look at a "struct page" via a pfn has to do a pfn_to_online_page() either way. That will fail if there isn't actually a memmap anymore because the memory has been unplugged. So "max_pfn" is actually rather a hint what maximum pfn to look at, and it can be bigger than it actually is.

The a look at the example usage in fs/proc/page.c:kpageflags_read()

pfn_to_online_page() will simply fail and stable_page_flags() will indicate a KPF_NOPAGE.

Just like we would have a big memory hole now at the end of memory.


- suppose that you have a platform which maps physical memory into the
CPU's address space at 0x00_4000_0000 (1GB offset) and the kernel boots
with 2GB of DRAM plugged by default. At that point we have not
registered a swiotlb because we have less than 4GB of addressable
physical memory, there is no IOMMU in that system, it's a happy world.
Now assume that we plug an additional 2GB of DRAM into that system
adjacent to the previous 2GB, from 0x00_C0000_0000 through
0x14_0000_0000, now we have physical addresses above 4GB, but we still
don't have a swiotlb, some of our DMA_BIT_MASK(32) peripherals are going
to be unable to DMA from that hot plugged memory, but they could if we
had a swiotlb.

That's why platforms that hotplug memory should indicate the maximum possible PFN via some mechanism during boot. On x86-64 (and IIRC also arm64 now), this is done via the ACPI SRAT.

And that's where "max_possible_pfn" and "max_pfn" differ. See drivers/acpi/numa/srat.c:acpi_numa_memory_affinity_init():

max_possible_pfn = max(max_possible_pfn, PFN_UP(end - 1));$


Using max_possible_pfn, the OS can properly setup the swiotlb, even thought it wouldn't currently be required when just looking at max_pfn.

I documented that for virtio-mem in
https://virtio-mem.gitlab.io/user-guide/user-guide-linux.html
"swiotlb and DMA memory".


- now let's go even further but this is very contrived. Assume that the
firmware has somewhat created a reserved memory region with a 'no-map'
attribute thus indicating it does not want a struct page to be created
for a specific PFN range, is it valid to "blindly" raise max_pfn if that
region were to be at the end of the just hot-plugged memory?

no-map means that no direct mapping is to be created, right? We would still have a memmap IIRC, and the pages are PG_reserved.

Again, I think this is very similar to just having no-map regions like random memory holes within the existing memory layout.


What Chris proposes here is very similar to arch/x86/mm/init_64.c:update_end_of_memory_vars() called during arch_add_memory()->add_pages() on x86-64.

--
Thanks,

David / dhildenb