Re: [PATCH v2 00/46] hugetlb: introduce HugeTLB high-granularity mapping

From: David Hildenbrand
Date: Thu Feb 23 2023 - 11:19:53 EST


On 23.02.23 16:53, James Houghton wrote:
On Thu, Feb 23, 2023 at 1:07 AM David Hildenbrand <david@xxxxxxxxxx> wrote:

On 22.02.23 21:57, Mina Almasry wrote:
On Wed, Feb 22, 2023 at 7:49 AM David Hildenbrand <david@xxxxxxxxxx> wrote:

On 21.02.23 22:46, Mike Kravetz wrote:
On 02/18/23 00:27, James Houghton wrote:
This series introduces the concept of HugeTLB high-granularity mapping
(HGM). This series teaches HugeTLB how to map HugeTLB pages at
high-granularity, similar to how THPs can be PTE-mapped.

Support for HGM in this series is for MAP_SHARED VMAs on x86_64 only. Other
architectures and (some) support for MAP_PRIVATE will come later.

This series is based on latest mm-unstable (ccd6a73daba9).

Notable changes with this series
================================

- hugetlb_add_file_rmap / hugetlb_remove_rmap are added to handle
mapcounting for non-anon hugetlb.
- The mapcounting scheme uses subpages' mapcounts for high-granularity
mappings, but it does not use subpages_mapcount(). This scheme
prevents the HugeTLB VMEMMAP optimization from being used, so it
will be improved in a later series.
- page_add_file_rmap and page_remove_rmap are updated so they can be
used by hugetlb_add_file_rmap / hugetlb_remove_rmap.
- MADV_SPLIT has been added to enable the userspace API changes that
HGM allows for: high-granularity UFFDIO_CONTINUE (and maybe other
changes in the future). MADV_SPLIT does NOT force all the mappings to
be PAGE_SIZE.
- MADV_COLLAPSE is expanded to include HugeTLB mappings.

Old versions:
v1: https://lore.kernel.org/linux-mm/20230105101844.1893104-1-jthoughton@xxxxxxxxxx/
RFC v2: https://lore.kernel.org/linux-mm/20221021163703.3218176-1-jthoughton@xxxxxxxxxx/
RFC v1: https://lore.kernel.org/linux-mm/20220624173656.2033256-1-jthoughton@xxxxxxxxxx/

Changelog:
v1 -> v2 (thanks Peter for all your suggestions!):
- Changed mapcount to be more THP-like, and make HGM incompatible with
HVO.
- HGM is now disabled by default to leave HVO enabled by default.

I understand the reasoning behind the move to THP-like mapcounting, and the
incompatibility with HVO. However, I just got to patch 5 and realized either
HGM or HVO will need to be chosen at kernel build time. That may not be an
issue for cloud providers or others building their own kernels for internal
use. However, distro kernels will need to pick one option or the other.
Right now, my Fedora desktop has HVO enabled so it would likely not have
HGM enabled. That is not a big deal for a desktop.

Just curious, do we have distro kernel users that want to use HGM?

Most certainly I would say :)

I'm not sure. Maybe distros want the hwpoison benefits HGM provides?
But that's not implemented in this series.

From what I can tell, HGM helps to improve live migration of VMs with gigantic pages. That sounds like a good reason why distros (that support virtualization) might want it independent of hwpoison changes.

--
Thanks,

David / dhildenb