Re: [PATCH v3 2/3] khugepaged: Optimize __collapse_huge_page_copy_succeeded() by PTE batching

From: Baolin Wang
Date: Tue Jul 22 2025 - 23:40:53 EST




On 2025/7/22 23:05, Dev Jain wrote:
Use PTE batching to optimize __collapse_huge_page_copy_succeeded().

On arm64, suppose khugepaged is scanning a pte-mapped 2MB THP for collapse.
Then, calling ptep_clear() for every pte will cause a TLB flush for every
contpte block. Instead, clear_ptes() does a contpte_try_unfold_partial()
which will flush the TLB only for the (if any) starting and ending contpte
block, if they partially overlap with the range khugepaged is looking at.

For all arches, there should be a benefit due to batching atomic operations
on mapcounts due to folio_remove_rmap_ptes() and saving some calls.

Signed-off-by: Dev Jain <dev.jain@xxxxxxx>

With David's comments addressed, LGTM.
Reviewed-by: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx>