On 09/05/25 6:15 am, Baolin Wang wrote:
When I tested the mincore() syscall, I observed that it takes longer with
64K mTHP enabled on my Arm64 server. The reason is the mincore_pte_range()
still checks each PTE individually, even when the PTEs are contiguous,
which is not efficient.
Thus we can use pte_batch_hint() to get the batch number of the present
contiguous PTEs, which can improve the performance. I tested the mincore()
syscall with 1G anonymous memory populated with 64K mTHP, and observed an
obvious performance improvement:
w/o patch w/ patch changes
6022us 549us +91%
Moreover, I also tested mincore() with disabling mTHP/THP, and did not
see any obvious regression for base pages.
Signed-off-by: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx>
Nit: The subject line - s/pte_batch_bint()/pte_batch_hint()
Otherwise LGTM
Reviewed-by: Dev Jain <dev.jain@xxxxxxx>