Re: [RFC PATCH 07/26] hugetlb: add hugetlb_pte to track HugeTLB page table entries

From: Peter Xu
Date: Fri Jul 15 2022 - 19:04:35 EST


On Fri, Jul 15, 2022 at 02:52:27PM -0700, Axel Rasmussen wrote:
> Guest access in terms of "physical" memory address is basically
> random. So, actually filling in all 262k 4K PTEs making up a
> contiguous 1G region might take quite some time. Once we've completed
> any of the various 2M contiguous regions, it would be nice to go ahead
> and collapse those right away. The benefit is, the guest will see some
> performance benefit from the 2G page already, without having to wait
> for the full 1G page to complete. Once we do complete a 1G page, it
> would be nice to collapse that one level further. If we do this, the
> whole guest memory will be a mix of 1G, 2M, and 4K.

Just to mention that we've got quite some other things that drags perf down
much more than tlb hits on page sizes during any VM migration process.

For example, when we split & wr-protect pages during the starting phase of
migration on src host, it's not about 10% or 20% drop but much drastic. In
the postcopy case it's for dest but still it's part of the whole migration
process and probably guest-aware too. If the guest wants, it can simply
start writting some pages continuously and it'll see obvious drag downs any
time during migration I bet.

It'll always be nice to have multi-level sub-mappings and I fully agree.
IMHO it's a matter of whether keeping 4k-only would greatly simplify the
work, especially on the rework of hugetlb sub-mage aware pgtable ops.

Thanks,

--
Peter Xu