Re: [PATCH v3] mm: mincore: use pte_batch_bint() to batch process large folios

From: Baolin Wang
Date: Fri May 09 2025 - 03:38:33 EST




On 2025/5/9 15:30, Dev Jain wrote:


On 09/05/25 6:15 am, Baolin Wang wrote:
When I tested the mincore() syscall, I observed that it takes longer with
64K mTHP enabled on my Arm64 server. The reason is the mincore_pte_range()
still checks each PTE individually, even when the PTEs are contiguous,
which is not efficient.

Thus we can use pte_batch_hint() to get the batch number of the present
contiguous PTEs, which can improve the performance. I tested the mincore()
syscall with 1G anonymous memory populated with 64K mTHP, and observed an
obvious performance improvement:

w/o patch        w/ patch        changes
6022us            549us            +91%

Moreover, I also tested mincore() with disabling mTHP/THP, and did not
see any obvious regression for base pages.

Signed-off-by: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx>

Nit: The subject line - s/pte_batch_bint()/pte_batch_hint()

Ah, fat finger. Hope Andrew can help to fix it:)

Otherwise LGTM

Reviewed-by: Dev Jain <dev.jain@xxxxxxx>

Thanks.