[RFC PATCH 43/45] mm: page_alloc: trigger defrag from allocator hot path on tainted-SPB pressure
From: Rik van Riel
Date: Thu Apr 30 2026 - 16:31:28 EST
From: Rik van Riel <riel@xxxxxxxx>
The per-SPB background defrag worker is currently triggered only from
spb_update_list(), which itself only fires when the SPB's category or
fullness bucket changes. Sub-bucket allocations (decrementing free
counters within the same bucket) do not re-evaluate.
drgn dump on a saturated devvm showed several tainted SPBs with
defrag_last_no_progress_jiffies set hundreds-to-thousands of seconds
ago — long after their 5-second SPB_DEFRAG_NOOP_COOLDOWN expired —
yet defrag had never been re-triggered on them. The shape of the
failure: a tainted SPB hits free=0, the worker tried once and made no
progress (movable pages mostly in mixed pageblocks, evacuating them
left the source PB still occupied by unmov/recl content), no-progress
cooldown stamped, no later allocator event crossed a fullness bucket
on that SPB so spb_update_list never re-fired the trigger. The SPB
sat stuck while subsequent non-movable allocs ended up tainting fresh
clean SPBs via PASS_3.
Add two complementary triggers in __rmqueue_smallest:
(1) On every PASS_1/2/2B/2C/2D success that already evaluates
spb_below_shrink_high_water(sb) (i.e. the same threshold at
which queue_spb_slab_shrink is fired), additionally call
spb_maybe_start_defrag(sb). Catches actively-pressured tainted
SPBs immediately, no extra hot-path predicate evaluation.
(2) Just before the PASS_3 fall-through that risks tainting a fresh
clean SPB, walk the tainted-SPB list and call
spb_maybe_start_defrag() on each. Catches SPBs that are stuck
with no allocator activity to drive (1). Bounded by
nr_tainted_spbs and only runs on the slow path that is about to
fragment the clean pool — appropriate to spend a list walk
here. The cooldown gate inside spb_needs_defrag() no-ops cheaply
for SPBs not yet eligible.
The cooldown still gates spb_needs_defrag() so neither trigger
storms the worker.
The existing spb_maybe_start_defrag() call inside spb_update_list()
is retained: it remains the trigger for the clean-SPB
within-superpageblock compaction path (spb_defrag_clean), which the
new alloc-path triggers do not cover (they only fire on
SB_TAINTED). Replacing the spb_update_list call entirely would
require a separate clean-SPB-specific trigger in the allocator and
is left for a follow-up.
Signed-off-by: Rik van Riel <riel@xxxxxxxxxxx>
Assisted-by: Claude:claude-opus-4.7 syzkaller
Also factor out the now-repeated tainted-alloc reaction into a helper
spb_react_to_tainted_alloc(sb, zone) and call it from all 8
PASS_1/2/2B/2C/2D success sites in __rmqueue_smallest. Centralizes the
gate (cat == SB_TAINTED && spb_below_shrink_high_water(sb)) and the
shrink+defrag kick in one place, removing duplication and reducing
the per-success-site noise.
---
mm/page_alloc.c | 73 +++++++++++++++++++++++++++++++++++--------------
1 file changed, 53 insertions(+), 20 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index af499f0a1a48..e15e71d5ac99 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2709,6 +2709,30 @@ static inline bool spb_below_shrink_high_water(const struct superpageblock *sb)
(unsigned long)spb_tainted_reserve(sb) * pageblock_nr_pages;
}
+/*
+ * spb_react_to_tainted_alloc - kick reclaim machinery on a tainted-SPB alloc.
+ *
+ * Called from each PASS_1/2/2B/2C/2D success path after a successful
+ * allocation against a tainted SPB. If the SPB is below its shrink
+ * high-water mark, queue the SPB-driven slab shrink and try to start
+ * the per-SPB defrag worker. Both have their own cooldown gates inside,
+ * so this is cheap to call on every such allocation.
+ *
+ * Skips quickly when the SPB is not tainted (e.g. movable allocation
+ * landing on a clean SPB) or when the high-water mark hasn't been
+ * crossed.
+ */
+static inline void spb_react_to_tainted_alloc(struct superpageblock *sb,
+ struct zone *zone)
+{
+ if (spb_get_category(sb) != SB_TAINTED)
+ return;
+ if (!spb_below_shrink_high_water(sb))
+ return;
+ queue_spb_slab_shrink(zone);
+ spb_maybe_start_defrag(sb);
+}
+
/*
* On systems with many superpageblocks, we can afford to "write off"
* tainted superpageblocks by aggressively packing unmovable/reclaimable
@@ -2969,9 +2993,7 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
page = try_alloc_from_sb_pass1(zone, cpu_hint,
order, migratetype);
if (page) {
- if (spb_get_category(cpu_hint) == SB_TAINTED &&
- spb_below_shrink_high_water(cpu_hint))
- queue_spb_slab_shrink(zone);
+ spb_react_to_tainted_alloc(cpu_hint, zone);
trace_mm_page_alloc_zone_locked(page, order,
migratetype,
pcp_allowed_order(order) &&
@@ -2984,9 +3006,7 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
page = try_alloc_from_sb_pass1(zone, zone_hint,
order, migratetype);
if (page) {
- if (spb_get_category(zone_hint) == SB_TAINTED &&
- spb_below_shrink_high_water(zone_hint))
- queue_spb_slab_shrink(zone);
+ spb_react_to_tainted_alloc(zone_hint, zone);
slot->zone = zone;
slot->sb = zone_hint;
trace_mm_page_alloc_zone_locked(page, order,
@@ -3057,9 +3077,8 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
page_del_and_expand(zone, page,
order, current_order,
migratetype);
- if (cat == SB_TAINTED &&
- spb_below_shrink_high_water(sb))
- queue_spb_slab_shrink(zone);
+ if (cat == SB_TAINTED)
+ spb_react_to_tainted_alloc(sb, zone);
trace_mm_page_alloc_zone_locked(
page, order, migratetype,
pcp_allowed_order(order) &&
@@ -3088,9 +3107,8 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
page_del_and_expand(zone, page,
order, current_order,
migratetype);
- if (cat == SB_TAINTED &&
- spb_below_shrink_high_water(sb))
- queue_spb_slab_shrink(zone);
+ if (cat == SB_TAINTED)
+ spb_react_to_tainted_alloc(sb, zone);
trace_mm_page_alloc_zone_locked(
page, order, migratetype,
pcp_allowed_order(order) &&
@@ -3145,8 +3163,7 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
page = claim_whole_block(zone, page,
current_order, order,
migratetype, MIGRATE_MOVABLE);
- if (spb_below_shrink_high_water(sb))
- queue_spb_slab_shrink(zone);
+ spb_react_to_tainted_alloc(sb, zone);
trace_mm_page_alloc_zone_locked(
page, order, migratetype,
pcp_allowed_order(order) &&
@@ -3184,8 +3201,7 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
0, true);
if (!page)
continue;
- if (spb_below_shrink_high_water(sb))
- queue_spb_slab_shrink(zone);
+ spb_react_to_tainted_alloc(sb, zone);
trace_mm_page_alloc_zone_locked(
page, order, migratetype,
pcp_allowed_order(order) &&
@@ -3269,8 +3285,7 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
opposite_mt);
__spb_set_has_type(page,
migratetype);
- if (spb_below_shrink_high_water(sb))
- queue_spb_slab_shrink(zone);
+ spb_react_to_tainted_alloc(sb, zone);
trace_mm_page_alloc_zone_locked(
page, order, migratetype,
pcp_allowed_order(order) &&
@@ -3342,8 +3357,7 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
MIGRATE_MOVABLE);
__spb_set_has_type(page,
migratetype);
- if (spb_below_shrink_high_water(sb))
- queue_spb_slab_shrink(zone);
+ spb_react_to_tainted_alloc(sb, zone);
trace_mm_page_alloc_zone_locked(
page, order, migratetype,
pcp_allowed_order(order) &&
@@ -3371,6 +3385,25 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
queue_spb_slab_shrink(zone);
}
+ /*
+ * Last-chance defrag trigger before tainting a fresh clean SPB.
+ * Walk the tainted-SPB list and try to wake the per-SPB defrag
+ * worker on each. Catches SPBs that are stuck in expired-cooldown
+ * state because no allocator activity has touched them recently
+ * (the routine event-driven trigger from spb_update_list only
+ * fires on bucket transitions, not on every alloc). Once the
+ * cooldown has expired, spb_maybe_start_defrag() will requeue
+ * work; otherwise the gate inside spb_needs_defrag() no-ops
+ * cheaply. Bounded by nr_tainted_spbs and only runs when we are
+ * already on the slow path of fragmenting the clean pool.
+ */
+ for (full = SB_FULL; full < __NR_SB_FULLNESS; full++) {
+ list_for_each_entry(sb,
+ &zone->spb_lists[SB_TAINTED][full], list) {
+ spb_maybe_start_defrag(sb);
+ }
+ }
+
/* Pass 3: whole pageblock from empty superpageblocks */
list_for_each_entry(sb, &zone->spb_empty, list) {
if (!sb->nr_free_pages)
--
2.52.0