Re: [PATCH mm-new v3 1/1] mm/khugepaged: abort collapse scan on non-swap entries

From: David Hildenbrand

Date: Wed Oct 08 2025 - 04:21:26 EST


On 08.10.25 05:26, Lance Yang wrote:
From: Lance Yang <lance.yang@xxxxxxxxx>

Currently, special non-swap entries (like PTE markers) are not caught
early in hpage_collapse_scan_pmd(), leading to failures deep in the
swap-in logic.

A function that is called __collapse_huge_page_swapin() and documented
to "Bring missing pages in from swap" will handle other types as well.

As analyzed by David[1], we could have ended up with the following
entry types right before do_swap_page():

(1) Migration entries. We would have waited.
-> Maybe worth it to wait, maybe not. We suspect we don't stumble
into that frequently such that we don't care. We could always
unlock this separately later.

(2) Device-exclusive entries. We would have converted to non-exclusive.
-> See make_device_exclusive(), we cannot tolerate PMD entries and
have to split them through FOLL_SPLIT_PMD. As popped up during
a recent discussion, collapsing here is actually
counter-productive, because the next conversion will PTE-map
it again.
-> Ok to not collapse.

(3) Device-private entries. We would have migrated to RAM.
-> Device-private still does not support THPs, so collapsing right
now just means that the next device access would split the
folio again.
-> Ok to not collapse.

(4) HWPoison entries
-> Cannot collapse

(5) Markers
-> Cannot collapse

First, this patch adds an early check for these non-swap entries. If
any one is found, the scan is aborted immediately with the
SCAN_PTE_NON_PRESENT result, as Lorenzo suggested[2], avoiding wasted
work. While at it, convert pte_swp_uffd_wp_any() to pte_swp_uffd_wp()
since we are in the swap pte branch.

Second, as Wei pointed out[3], we may have a chance to get a non-swap
entry, since we will drop and re-acquire the mmap lock before
__collapse_huge_page_swapin(). To handle this, we also add a
non_swap_entry() check there.

Note that we can unlock later what we really need, and not account it
towards max_swap_ptes.

[1] https://lore.kernel.org/linux-mm/09eaca7b-9988-41c7-8d6e-4802055b3f1e@xxxxxxxxxx
[2] https://lore.kernel.org/linux-mm/7df49fe7-c6b7-426a-8680-dcd55219c8bd@lucifer.local
[3] https://lore.kernel.org/linux-mm/20251005010511.ysek2nqojebqngf3@master

Acked-by: David Hildenbrand <david@xxxxxxxxxx>
Reviewed-by: Wei Yang <richard.weiyang@xxxxxxxxx>
Reviewed-by: Dev Jain <dev.jain@xxxxxxx>
Suggested-by: David Hildenbrand <david@xxxxxxxxxx>
Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx>
Signed-off-by: Lance Yang <lance.yang@xxxxxxxxx>
---

Sorry for not replying earlier to your other mail.

LGTM.

We can always handle migration entries later if this shows up to be a problem (this time, in a clean way ...) and not count them towards actual "swap" entries.

--
Cheers

David / dhildenb