Re: [PATCH 3/3] fs: stable_page_flags(): use snapshot_page()

From: Luiz Capitulino
Date: Wed Jul 02 2025 - 13:37:04 EST


On 2025-07-01 14:44, David Hildenbrand wrote:
On 26.06.25 20:16, Luiz Capitulino wrote:
A race condition is possible in stable_page_flags() where user-space is
reading /proc/kpageflags concurrently to a folio split. This may lead to
oopses or BUG_ON()s being triggered.

To fix this, this commit uses snapshot_page() in stable_page_flags() so
that stable_page_flags() works with a stable page and folio snapshots
instead.

Note that stable_page_flags() makes use of some functions that require
the original page or folio pointer to work properly (eg.
is_free_budy_page() and folio_test_idle()). Since those functions can't
be used on the page snapshot, we replace their usage with flags that
were set by snapshot_page() for this purpose.

Signed-off-by: Luiz Capitulino <luizcap@xxxxxxxxxx>
---
  fs/proc/page.c | 25 ++++++++++++++-----------
  1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/fs/proc/page.c b/fs/proc/page.c
index 936f8bbe5a6f..a2ee95f727f0 100644
--- a/fs/proc/page.c
+++ b/fs/proc/page.c
@@ -147,6 +147,7 @@ static inline u64 kpf_copy_bit(u64 kflags, int ubit, int kbit)
  u64 stable_page_flags(const struct page *page)
  {
      const struct folio *folio;
+    struct page_snapshot ps;
      unsigned long k;
      unsigned long mapping;
      bool is_anon;
@@ -158,7 +159,9 @@ u64 stable_page_flags(const struct page *page)
       */
      if (!page)
          return 1 << KPF_NOPAGE;
-    folio = page_folio(page);
+
+    snapshot_page(&ps, page);
+    folio = &ps.folio_snapshot;
      k = folio->flags;
      mapping = (unsigned long)folio->mapping;
@@ -167,7 +170,7 @@ u64 stable_page_flags(const struct page *page)
      /*
       * pseudo flags for the well known (anonymous) memory mapped pages
       */
-    if (page_mapped(page))
+    if (folio_mapped(folio))
          u |= 1 << KPF_MMAP;
      if (is_anon) {
          u |= 1 << KPF_ANON;
@@ -179,7 +182,7 @@ u64 stable_page_flags(const struct page *page)
       * compound pages: export both head/tail info
       * they together define a compound page's start/end pos and order
       */
-    if (page == &folio->page)
+    if (ps.idx == 0)
          u |= kpf_copy_bit(k, KPF_COMPOUND_HEAD, PG_head);
      else
          u |= 1 << KPF_COMPOUND_TAIL;
@@ -189,10 +192,10 @@ u64 stable_page_flags(const struct page *page)
               folio_test_large_rmappable(folio)) {
          /* Note: we indicate any THPs here, not just PMD-sized ones */
          u |= 1 << KPF_THP;
-    } else if (is_huge_zero_folio(folio)) {
+    } else if (ps.flags & PAGE_SNAPSHOT_PG_HUGE_ZERO) {

For that, we could use

is_huge_zero_pfn(ps.pfn)

from

https://lkml.kernel.org/r/20250617154345.2494405-10-david@xxxxxxxxxx


You should be able to cherry pick that commit (only minor conflict in vm_normal_page_pmd()) and include it in this series.

OK, will do.


          u |= 1 << KPF_ZERO_PAGE;
          u |= 1 << KPF_THP;
-    } else if (is_zero_folio(folio)) {
+    } else if (is_zero_pfn(ps.pfn)) {
          u |= 1 << KPF_ZERO_PAGE;
      }
@@ -200,14 +203,14 @@ u64 stable_page_flags(const struct page *page)
       * Caveats on high order pages: PG_buddy and PG_slab will only be set
       * on the head page.
       */
-    if (PageBuddy(page))
+    if (PageBuddy(&ps.page_snapshot))
          u |= 1 << KPF_BUDDY;
-    else if (page_count(page) == 0 && is_free_buddy_page(page))
> +    else if (ps.flags & PAGE_SNAPSHOT_PG_FREE)

Yeah, that is nasty, and inherently racy. So detecting it an snapshot time might be best.

Which makes me wonder if this whole block should simply be

if (ps.flags & PAGE_SNAPSHOT_PG_BUDDY)
    u |= 1 << KPF_BUDDY;

and you move all buddy detection into the snapshotting function. That is, PAGE_SNAPSHOT_PG_BUDDY gets set for head and tail pages of buddy pages.

Looks less special that way ;)

I can do this too.


          u |= 1 << KPF_BUDDY;
-    if (PageOffline(page))
+    if (folio_test_offline(folio))
          u |= 1 << KPF_OFFLINE;
> -    if (PageTable(page))> +    if (folio_test_pgtable(folio))
          u |= 1 << KPF_PGTABLE;

I assume, long-term none of these will actually be folios. But we can change that once we get to it.

(likely, going back to pages ... just like for the slab case below)

      if (folio_test_slab(folio))
          u |= 1 << KPF_SLAB;
@@ -215,7 +218,7 @@ u64 stable_page_flags(const struct page *page)
  #if defined(CONFIG_PAGE_IDLE_FLAG) && defined(CONFIG_64BIT)
      u |= kpf_copy_bit(k, KPF_IDLE,          PG_idle);
  #else
-    if (folio_test_idle(folio))
+    if (ps.flags & PAGE_SNAPSHOT_PG_IDLE)
          u |= 1 << KPF_IDLE;

Another nasty 32bit case. At least once we decouple pages from folios,
the while test-idle in page-ext will vanish and this will get cleaned up.

Thanks for the review!