[PATCH 0/3] THP Shrinker

From: alexlzhu
Date: Wed Sep 28 2022 - 02:25:27 EST


From: Alexander Zhu <alexlzhu@xxxxxx>

Transparent Hugepages use a larger page size of 2MB in comparison to
normal sized pages that are 4kb. A larger page size allows for fewer TLB
cache misses and thus more efficient use of the CPU. Using a larger page
size also results in more memory waste, which can hurt performance in some
use cases. THPs are currently enabled in the Linux Kernel by applications
in limited virtual address ranges via the madvise system call. The THP
shrinker tries to find a balance between increased use of THPs, and
increased use of memory. It shrinks the size of memory by removing the
underutilized THPs that are identified by the thp_utilization scanner.

In our experiments we have noticed that the least utilized THPs are almost
entirely unutilized.

Sample Output:

Utilized[0-50]: 1331 680884
Utilized[51-101]: 9 3983
Utilized[102-152]: 3 1187
Utilized[153-203]: 0 0
Utilized[204-255]: 2 539
Utilized[256-306]: 5 1135
Utilized[307-357]: 1 192
Utilized[358-408]: 0 0
Utilized[409-459]: 1 57
Utilized[460-512]: 400 13
Last Scan Time: 223.98s
Last Scan Duration: 70.65s

Above is a sample obtained from one of our test machines when THP is always
enabled. Of the 1331 THPs in this thp_utilization sample that have from
0-50 utilized subpages, we see that there are 680884 free pages. This
comes out to 680884 / (512 * 1331) = 99.91% zero pages in the least
utilized bucket. This represents 680884 * 4KB = 2.7GB memory waste.

Also note that the vast majority of pages are either in the least utilized
[0-50] or most utilized [460-512] buckets. The least utilized THPs are
responsible for almost all of the memory waste when THP is always
enabled. Thus by clearing out THPs in the lowest utilization bucket
we extract most of the improvement in CPU efficiency. We have seen
similar results on our production hosts.

This patchset introduces the THP shrinker we have developed to identify
and split the least utilized THPs. It includes the thp_utilization
changes that groups anonymous THPs into buckets, the split_huge_page()
changes that identify and zap zero 4KB pages within THPs and the shrinker
changes. It should be noted that the split_huge_page() changes are based
off previous work done by Yu Zhao.

In the future, we intend to allow additional tuning to the shrinker
based on workload depending on CPU/IO/Memory pressure and the
amount of anonymous memory. The long term goal is to eventually always
enable THP for all applications and deprecate madvise entirely.

In production we thus far have observed 2-3% reduction in overall cpu
usage on stateless web servers when THP is always enabled.

Alexander Zhu (3):
mm: add thp_utilization metrics to debugfs
mm: changes to split_huge_page() to free zero filled tail pages
mm: THP low utilization shrinker

Documentation/admin-guide/mm/transhuge.rst | 9 +
include/linux/huge_mm.h | 10 +
include/linux/list_lru.h | 24 ++
include/linux/mm_types.h | 5 +
include/linux/rmap.h | 2 +-
include/linux/vm_event_item.h | 3 +
mm/huge_memory.c | 306 +++++++++++++++++-
mm/list_lru.c | 49 +++
mm/migrate.c | 72 ++++-
mm/migrate_device.c | 4 +-
mm/page_alloc.c | 6 +
mm/vmstat.c | 3 +
.../selftests/vm/split_huge_page_test.c | 114 ++++++-
tools/testing/selftests/vm/vm_util.c | 23 ++
tools/testing/selftests/vm/vm_util.h | 1 +
15 files changed, 613 insertions(+), 18 deletions(-)

--
2.30.2