[PATCH] docs/mm: expand vma doc to highlight pte freeing, non-vma traversal

From: Lorenzo Stoakes
Date: Mon Jun 02 2025 - 17:07:51 EST


The process addresses documentation already contains a great deal of
information about mmap/VMA locking and page table traversal and
manipulation.

However it waves it hands about non-VMA traversal. Add a section for this
and explain the caveats around this kind of traversal.

Additionally, commit 6375e95f381e ("mm: pgtable: reclaim empty PTE page in
madvise(MADV_DONTNEED)") caused zapping to also free empty PTE page
tables. Highlight this and reference how this impacts ptdump non-VMA
traversal of userland mappings.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx>
---
Documentation/mm/process_addrs.rst | 58 ++++++++++++++++++++++++++----
1 file changed, 52 insertions(+), 6 deletions(-)

diff --git a/Documentation/mm/process_addrs.rst b/Documentation/mm/process_addrs.rst
index e6756e78b476..83166c2b47dc 100644
--- a/Documentation/mm/process_addrs.rst
+++ b/Documentation/mm/process_addrs.rst
@@ -303,7 +303,9 @@ There are four key operations typically performed on page tables:
1. **Traversing** page tables - Simply reading page tables in order to traverse
them. This only requires that the VMA is kept stable, so a lock which
establishes this suffices for traversal (there are also lockless variants
- which eliminate even this requirement, such as :c:func:`!gup_fast`).
+ which eliminate even this requirement, such as :c:func:`!gup_fast`). There is
+ also a special case of page table traversal for non-VMA regions which we
+ consider separately below.
2. **Installing** page table mappings - Whether creating a new mapping or
modifying an existing one in such a way as to change its identity. This
requires that the VMA is kept stable via an mmap or VMA lock (explicitly not
@@ -335,15 +337,14 @@ ahead and perform these operations on page tables (though internally, kernel
operations that perform writes also acquire internal page table locks to
serialise - see the page table implementation detail section for more details).

+.. note:: Since v6.14 and commit 6375e95f381e ("mm: pgtable: reclaim empty PTE
+ page in madvise (MADV_DONTNEED)"), we now also free empty PTE tables
+ on zap. This does not change zapping locking requirements.
+
When **installing** page table entries, the mmap or VMA lock must be held to
keep the VMA stable. We explore why this is in the page table locking details
section below.

-.. warning:: Page tables are normally only traversed in regions covered by VMAs.
- If you want to traverse page tables in areas that might not be
- covered by VMAs, heavier locking is required.
- See :c:func:`!walk_page_range_novma` for details.
-
**Freeing** page tables is an entirely internal memory management operation and
has special requirements (see the page freeing section below for more details).

@@ -355,6 +356,47 @@ has special requirements (see the page freeing section below for more details).
from the reverse mappings, but no other VMAs can be permitted to be
accessible and span the specified range.

+Traversing non-VMA page tables
+------------------------------
+
+We've focused above on traversal of page tables belonging to VMAs. It is also
+possible to traverse page tables which are not represented by VMAs.
+
+Primarily this is used to traverse kernel page table mappings. In which case one
+must hold an mmap **read** lock on the :c:macro:`!init_mm` kernel instantiation
+of the :c:struct:`!struct mm_struct` metadata object, as performed in
+:c:func:`walk_page_range_novma`.
+
+This is generally sufficient to preclude other page table walkers (excluding
+vmalloc regions and memory hot plug) as the intermediate kernel page tables are
+not usually freed.
+
+For cases where they might be then the caller has to acquire the appropriate
+additional locks.
+
+The truly unusual case is the traversal of non-VMA ranges in **userland**
+ranges.
+
+This has only one user - the general page table dumping logic (implemented in
+:c:macro:`!mm/ptdump.c`) - which seeks to expose all mappings for debug purposes
+even if they are highly unusual (possibly architecture-specific) and are not
+backed by a VMA.
+
+We must take great care in this case, as the :c:func:`!munmap` implementation
+detaches VMAs under an mmap write lock before tearing down page tables under a
+downgraded mmap read lock.
+
+This means such an operation could race with this, and thus an mmap **write**
+lock is required.
+
+.. warning:: A racing zap operation is problematic if it is performed without an
+ exclusive lock held - since v6.14 and commit 6375e95f381e PTEs may
+ be freed upon zap, so if this occurs the traversal might encounter
+ the same issue seen due to :c:func:`!munmap`'s use of a downgraded
+ mmap lock.
+
+ In this instance, additional appropriate locking is required.
+
Lock ordering
-------------

@@ -461,6 +503,10 @@ Locking Implementation Details
Page table locking details
--------------------------

+.. note:: This section explores page table locking requirements for page tables
+ encompassed by a VMA. See the above section on non-VMA page table
+ traversal for details on how we handle that case.
+
In addition to the locks described in the terminology section above, we have
additional locks dedicated to page tables:

--
2.49.0