On 26.06.25 20:16, Luiz Capitulino wrote:
A race condition is possible in stable_page_flags() where user-space is
reading /proc/kpageflags concurrently to a folio split. This may lead to
oopses or BUG_ON()s being triggered.
To fix this, this commit uses snapshot_page() in stable_page_flags() so
that stable_page_flags() works with a stable page and folio snapshots
instead.
Note that stable_page_flags() makes use of some functions that require
the original page or folio pointer to work properly (eg.
is_free_budy_page() and folio_test_idle()). Since those functions can't
be used on the page snapshot, we replace their usage with flags that
were set by snapshot_page() for this purpose.
Signed-off-by: Luiz Capitulino <luizcap@xxxxxxxxxx>
---
fs/proc/page.c | 25 ++++++++++++++-----------
1 file changed, 14 insertions(+), 11 deletions(-)
diff --git a/fs/proc/page.c b/fs/proc/page.c
index 936f8bbe5a6f..a2ee95f727f0 100644
--- a/fs/proc/page.c
+++ b/fs/proc/page.c
@@ -147,6 +147,7 @@ static inline u64 kpf_copy_bit(u64 kflags, int ubit, int kbit)
u64 stable_page_flags(const struct page *page)
{
const struct folio *folio;
+ struct page_snapshot ps;
unsigned long k;
unsigned long mapping;
bool is_anon;
@@ -158,7 +159,9 @@ u64 stable_page_flags(const struct page *page)
*/
if (!page)
return 1 << KPF_NOPAGE;
- folio = page_folio(page);
+
+ snapshot_page(&ps, page);
+ folio = &ps.folio_snapshot;
k = folio->flags;
mapping = (unsigned long)folio->mapping;
@@ -167,7 +170,7 @@ u64 stable_page_flags(const struct page *page)
/*
* pseudo flags for the well known (anonymous) memory mapped pages
*/
- if (page_mapped(page))
+ if (folio_mapped(folio))
u |= 1 << KPF_MMAP;
if (is_anon) {
u |= 1 << KPF_ANON;
@@ -179,7 +182,7 @@ u64 stable_page_flags(const struct page *page)
* compound pages: export both head/tail info
* they together define a compound page's start/end pos and order
*/
- if (page == &folio->page)
+ if (ps.idx == 0)
u |= kpf_copy_bit(k, KPF_COMPOUND_HEAD, PG_head);
else
u |= 1 << KPF_COMPOUND_TAIL;
@@ -189,10 +192,10 @@ u64 stable_page_flags(const struct page *page)
folio_test_large_rmappable(folio)) {
/* Note: we indicate any THPs here, not just PMD-sized ones */
u |= 1 << KPF_THP;
- } else if (is_huge_zero_folio(folio)) {
+ } else if (ps.flags & PAGE_SNAPSHOT_PG_HUGE_ZERO) {
For that, we could use
is_huge_zero_pfn(ps.pfn)
from
https://lkml.kernel.org/r/20250617154345.2494405-10-david@xxxxxxxxxx
You should be able to cherry pick that commit (only minor conflict in vm_normal_page_pmd()) and include it in this series.
u |= 1 << KPF_ZERO_PAGE;> + else if (ps.flags & PAGE_SNAPSHOT_PG_FREE)
u |= 1 << KPF_THP;
- } else if (is_zero_folio(folio)) {
+ } else if (is_zero_pfn(ps.pfn)) {
u |= 1 << KPF_ZERO_PAGE;
}
@@ -200,14 +203,14 @@ u64 stable_page_flags(const struct page *page)
* Caveats on high order pages: PG_buddy and PG_slab will only be set
* on the head page.
*/
- if (PageBuddy(page))
+ if (PageBuddy(&ps.page_snapshot))
u |= 1 << KPF_BUDDY;
- else if (page_count(page) == 0 && is_free_buddy_page(page))
Yeah, that is nasty, and inherently racy. So detecting it an snapshot time might be best.
Which makes me wonder if this whole block should simply be
if (ps.flags & PAGE_SNAPSHOT_PG_BUDDY)
u |= 1 << KPF_BUDDY;
and you move all buddy detection into the snapshotting function. That is, PAGE_SNAPSHOT_PG_BUDDY gets set for head and tail pages of buddy pages.
Looks less special that way ;)
u |= 1 << KPF_BUDDY;> - if (PageTable(page))> + if (folio_test_pgtable(folio))
- if (PageOffline(page))
+ if (folio_test_offline(folio))
u |= 1 << KPF_OFFLINE;
u |= 1 << KPF_PGTABLE;
I assume, long-term none of these will actually be folios. But we can change that once we get to it.
(likely, going back to pages ... just like for the slab case below)
if (folio_test_slab(folio))
u |= 1 << KPF_SLAB;
@@ -215,7 +218,7 @@ u64 stable_page_flags(const struct page *page)
#if defined(CONFIG_PAGE_IDLE_FLAG) && defined(CONFIG_64BIT)
u |= kpf_copy_bit(k, KPF_IDLE, PG_idle);
#else
- if (folio_test_idle(folio))
+ if (ps.flags & PAGE_SNAPSHOT_PG_IDLE)
u |= 1 << KPF_IDLE;
Another nasty 32bit case. At least once we decouple pages from folios,
the while test-idle in page-ext will vanish and this will get cleaned up.