Re: [PATCH 2/3] mm: Add generic helper to hint a large folio

From: Dev Jain
Date: Thu May 08 2025 - 01:08:34 EST




On 07/05/25 3:33 pm, David Hildenbrand wrote:
On 06.05.25 07:00, Dev Jain wrote:
To use PTE batching, we want to determine whether the folio mapped by
the PTE is large, thus requiring the use of vm_normal_folio(). We want
to avoid the cost of vm_normal_folio() if the code path doesn't already
require the folio. For arm64, pte_batch_hint() does the job. To generalize
this hint, add a helper which will determine whether two consecutive PTEs
point to consecutive PFNs, in which case there is a high probability that
the underlying folio is large.

Signed-off-by: Dev Jain <dev.jain@xxxxxxx>
---
  include/linux/pgtable.h | 16 ++++++++++++++++
  1 file changed, 16 insertions(+)

diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index b50447ef1c92..28e21fcc7837 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -369,6 +369,22 @@ static inline pgd_t pgdp_get(pgd_t *pgdp)
  }
  #endif
+/* Caller must ensure that ptep + 1 exists */
+static inline bool maybe_contiguous_pte_pfns(pte_t *ptep, pte_t pte)
+{
+    pte_t *next_ptep, next_pte;
+
+    if (pte_batch_hint(ptep, pte) != 1)
+        return true;
+
+    next_ptep = ptep + 1;
+    next_pte = ptep_get(next_ptep);
+    if (!pte_present(next_pte))
+        return false;
+
+    return unlikely(pte_pfn(next_pte) - pte_pfn(pte) == PAGE_SIZE);
+}

So, where we want to use that is:

if (pte_present(old_pte)) {
    if ((max_nr != 1) && maybe_contiguous_pte_pfns(old_ptep, old_pte)) {
        struct folio *folio = vm_normal_folio(vma, old_addr, old_pte);

        if (folio && folio_test_large(folio))
            nr = folio_pte_batch(folio, old_addr, old_ptep,
                         old_pte, max_nr, fpb_flags, NULL, NULL, NULL);
    }
}

where we won't need the folio later. But want it all part of the same folio?


And the simpler version would be


if (pte_present(old_pte)) {
    if (max_nr != 1) {
        struct folio *folio = vm_normal_folio(vma, old_addr, old_pte);

        if (folio && folio_test_large(folio))
            nr = folio_pte_batch(folio, old_addr, old_ptep,
                         old_pte, max_nr, fpb_flags, NULL, NULL, NULL);
    }
}


Two things come to mind:

(1) Do we *really* care about the vm_normal_folio() + folio_test_large() call that much, that you
have to add this optimization ahead of times ? :)

For my mprotect series, I see a regression of almost (7.7 - 7.65)/7.7 = 0.65% for the small folio case. I am happy to remove this micro-optimization if that is the preference.


(2) Do we really need "must be part of the same folio", or could be just batch over present
ptes that map consecutive PFNs? In that case, a helper that avoids folio_pte_batch() completely
might be better.

I am not sure I get you here. folio_pte_batch() seems to be the simplest thing we can do as being done around in the code elsewhere, I am not aware of any alternate.