Re: [PATCH v5 07/12] khugepaged: add mTHP support
From: Jann Horn
Date: Wed Apr 30 2025 - 16:52:35 EST
On Mon, Apr 28, 2025 at 8:12 PM Nico Pache <npache@xxxxxxxxxx> wrote:
> Introduce the ability for khugepaged to collapse to different mTHP sizes.
> While scanning PMD ranges for potential collapse candidates, keep track
> of pages in KHUGEPAGED_MIN_MTHP_ORDER chunks via a bitmap. Each bit
> represents a utilized region of order KHUGEPAGED_MIN_MTHP_ORDER ptes. If
> mTHPs are enabled we remove the restriction of max_ptes_none during the
> scan phase so we dont bailout early and miss potential mTHP candidates.
>
> After the scan is complete we will perform binary recursion on the
> bitmap to determine which mTHP size would be most efficient to collapse
> to. max_ptes_none will be scaled by the attempted collapse order to
> determine how full a THP must be to be eligible.
>
> If a mTHP collapse is attempted, but contains swapped out, or shared
> pages, we dont perform the collapse.
[...]
> @@ -1208,11 +1211,12 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address,
> vma_start_write(vma);
> anon_vma_lock_write(vma->anon_vma);
>
> - mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, address,
> - address + HPAGE_PMD_SIZE);
> + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, _address,
> + _address + (PAGE_SIZE << order));
> mmu_notifier_invalidate_range_start(&range);
>
> pmd_ptl = pmd_lock(mm, pmd); /* probably unnecessary */
> +
> /*
> * This removes any huge TLB entry from the CPU so we won't allow
> * huge and small TLB entries for the same virtual address to
It's not visible in this diff, but we're about to do a
pmdp_collapse_flush() here. pmdp_collapse_flush() tears down the
entire page table, meaning it tears down 2MiB of address space; and it
assumes that the entire page table exclusively corresponds to the
current VMA.
I think you'll need to ensure that the pmdp_collapse_flush() only
happens for full-size THP, and that mTHP only tears down individual
PTEs in the relevant range. (That code might get a bit messy, since
the existing THP code tears down PTEs in a detached page table, while
mTHP would have to do it in a still-attached page table.)