Re: [PATCH 2/2] mm: convert do_set_pmd() to take a folio

From: David Hildenbrand
Date: Thu May 08 2025 - 03:36:27 EST


On 08.05.25 01:46, Zi Yan wrote:
On 7 May 2025, at 17:24, David Hildenbrand wrote:

On 07.05.25 14:10, Matthew Wilcox wrote:
On Wed, May 07, 2025 at 05:26:13PM +0800, Baolin Wang wrote:
In do_set_pmd(), we always use the folio->page to build PMD mappings for
the entire folio. Since all callers of do_set_pmd() already hold a stable
folio, converting do_set_pmd() to take a folio is safe and more straightforward.

What testing did you do of this?

-vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
+vm_fault_t do_set_pmd(struct vm_fault *vmf, struct folio *folio)
{
- struct folio *folio = page_folio(page);
struct vm_area_struct *vma = vmf->vma;
bool write = vmf->flags & FAULT_FLAG_WRITE;
unsigned long haddr = vmf->address & HPAGE_PMD_MASK;
pmd_t entry;
vm_fault_t ret = VM_FAULT_FALLBACK;
+ struct page *page;

Because I see nowhere in this patch that you initialise 'page'.

And that's really the important part. You seem to be assuming that a
folio will never be larger than PMD size, and I'm not comfortable with
that assumption. It's a limitation I put in place a few years ago so we
didn't have to find and fix all those assumptions immediately, but I
imagine that some day we'll want to have larger folios.

So unless you can derive _which_ page in the folio we want to map from
the vmf, NACK this patch.

Agreed. Probably folio + idx is our best bet.

Which raises an interesting question: I assume in the future, when we have a 4 MiB folio on x86-64 that is *misaligned* in VA space regarding PMDs (e.g., aligned to 1 MiB but not 2 MiB), we could still allow to use a PMD for the middle part.

It might not be possible if the folio comes from buddy allocator due to how
buddy allocator merges a PFN with its buddy (see __find_buddy_pfn() in mm/internal.h).
A 4MB folio will always be two 2MB-aligned parts. In addition, VA and PA need
to have the same lower 9+12 bits for a PMD mapping. So PMD mappings for
a 4MB folio would always be two PMDs. Let me know if I miss anything.

PA is clear. But is mis-alignment in VA space impossible on all architectures? I certainly remember it being impossible on x86-64 and s390x (remaining PMD entry bits used for something else).

--
Cheers,

David / dhildenb