Re: [PATCH 21/46] hugetlb: use struct hugetlb_pte for walk_hugetlb_range

From: Peter Xu
Date: Thu Jan 19 2023 - 17:43:05 EST


On Thu, Jan 19, 2023 at 02:00:32PM -0800, Mike Kravetz wrote:
> I do not know much about the (primary) live migration use case. My
> guess is that page table lock contention may be an issue? In this use
> case, HGM is only enabled for the duration the live migration operation,
> then a MADV_COLLAPSE is performed. If contention is likely to be an
> issue during this time, then yes we would need to pass around with
> something like hugetlb_pte.

I'm not aware of any such contention issue. IMHO the migration problem is
majorly about being too slow transferring a page being so large. Shrinking
the page size should resolve the major problem already here IIUC.

AFAIU 4K-only solution should only reduce any lock contention because locks
will always be pte-level if VM_HUGETLB_HGM set. When walking and creating
the intermediate pgtable entries we can use atomic ops just like generic
mm, so no lock needed at all. With uncertainty on the size of mappings,
we'll need to take any of the multiple layers of locks.

[...]

> > None of these complexities are particularly major in my opinion.
>
> Perhaps not. I was just thinking about the overall complexity of the
> hugetlb code after HGM. Currently, it is 'relatively simple' with
> fixed huge page sizes. IMO, much simpler than THP with two possible
> mapping sizes. With HGM and intermediate mapping sizes, it seems
> things could get more complicated than THP. Perhaps it is just me.

Please count me in. :) I'm just still happy to see what it'll look like if
James think having that complexity doesn't greatly affect the whole design.

Thanks,

--
Peter Xu