Introduce the ability for khugepaged to collapse to different mTHP sizes.
While scanning PMD ranges for potential collapse candidates, keep track
of pages in KHUGEPAGED_MIN_MTHP_ORDER chunks via a bitmap. Each bit
represents a utilized region of order KHUGEPAGED_MIN_MTHP_ORDER ptes. If
mTHPs are enabled we remove the restriction of max_ptes_none during the
scan phase so we dont bailout early and miss potential mTHP candidates.
After the scan is complete we will perform binary recursion on the
bitmap to determine which mTHP size would be most efficient to collapse
to. max_ptes_none will be scaled by the attempted collapse order to
determine how full a THP must be to be eligible.
If a mTHP collapse is attempted, but contains swapped out, or shared
pages, we dont perform the collapse.
For non PMD collapse we much leave the anon VMA write locked until after
we collapse the mTHP
-
- spin_lock(pmd_ptl);
- BUG_ON(!pmd_none(*pmd));
- folio_add_new_anon_rmap(folio, vma, address, RMAP_EXCLUSIVE);
- folio_add_lru_vma(folio, vma);
- pgtable_trans_huge_deposit(mm, pmd, pgtable);
- set_pmd_at(mm, address, pmd, _pmd);
- update_mmu_cache_pmd(vma, address, pmd);
- deferred_split_folio(folio, false);
- spin_unlock(pmd_ptl);
+ if (order == HPAGE_PMD_ORDER) {
+ pgtable = pmd_pgtable(_pmd);
+ _pmd = folio_mk_pmd(folio, vma->vm_page_prot);
+ _pmd = maybe_pmd_mkwrite(pmd_mkdirty(_pmd), vma);
+
+ spin_lock(pmd_ptl);
+ BUG_ON(!pmd_none(*pmd));
+ folio_add_new_anon_rmap(folio, vma, _address, RMAP_EXCLUSIVE);
+ folio_add_lru_vma(folio, vma);
+ pgtable_trans_huge_deposit(mm, pmd, pgtable);
+ set_pmd_at(mm, address, pmd, _pmd);
+ update_mmu_cache_pmd(vma, address, pmd);
+ deferred_split_folio(folio, false);
+ spin_unlock(pmd_ptl);
+ } else { /* mTHP collapse */
+ mthp_pte = mk_pte(&folio->page, vma->vm_page_prot);
+ mthp_pte = maybe_mkwrite(pte_mkdirty(mthp_pte), vma);
+
+ spin_lock(pmd_ptl);