Re: [PATCH v4 2/2] mm: Optimize mremap() by PTE batching
From: Pedro Falcato
Date: Thu Jun 12 2025 - 08:14:13 EST
On Tue, Jun 10, 2025 at 09:20:43AM +0530, Dev Jain wrote:
> Use folio_pte_batch() to optimize move_ptes(). On arm64, if the ptes
> are painted with the contig bit, then ptep_get() will iterate through all 16
> entries to collect a/d bits. Hence this optimization will result in a 16x
> reduction in the number of ptep_get() calls. Next, ptep_get_and_clear()
> will eventually call contpte_try_unfold() on every contig block, thus
> flushing the TLB for the complete large folio range. Instead, use
> get_and_clear_full_ptes() so as to elide TLBIs on each contig block, and only
> do them on the starting and ending contig block.
>
> For split folios, there will be no pte batching; nr_ptes will be 1. For
> pagetable splitting, the ptes will still point to the same large folio;
> for arm64, this results in the optimization described above, and for other
> arches (including the general case), a minor improvement is expected due to
> a reduction in the number of function calls.
>
> Signed-off-by: Dev Jain <dev.jain@xxxxxxx>
Reviewed-by: Pedro Falcato <pfalcato@xxxxxxx>
--
Pedro