Re: [PATCH] mm/khugepaged: increase transparent_hugepage_recommend_disable parameter to disable active modification of min_free_kbytes

From: Yang Shi
Date: Fri Sep 01 2023 - 13:29:28 EST


On Thu, Aug 31, 2023 at 7:30 AM Liu Song <liusong@xxxxxxxxxxxxxxxxx> wrote:
>
> 在 2023/8/30 04:04, Yang Shi 写道:
>
> > On Wed, Aug 16, 2023 at 8:52 PM Liu Song <liusong@xxxxxxxxxxxxxxxxx> wrote:
> >> In the arm64 environment, when PAGESIZE is 4K, the "pageblock_nr_pages"
> >> value is 512, and the recommended min_free_kbytes in
> >> "set_recommended_min_free_kbytes" usually does not exceed 44MB.
> >>
> >> However, when PAGESIZE is 64K, the "pageblock_nr_pages" value is 8192,
> >> and the recommended min_free_kbytes in "set_recommended_min_free_kbytes"
> >> is 8192 * 2 * (2 + 9) * 64K, which directly increases to 11GB.
> >>
> >> According to this calculation method, due to the modification of min_free_kbytes,
> >> the reserved memory in my 128GB memory environment reaches 10GB, and MemAvailable
> >> is correspondingly reduced by 10GB.
> >>
> >> In the case of PAGESIZE 64K, transparent hugepages are 512MB, and we only
> >> need them to be used on demand. If transparent hugepages cannot be allocated,
> >> falling back to regular 64K pages is completely acceptable.
> >>
> >> Therefore, we added the transparent_hugepage_recommend_disable parameter
> >> to disable active modification of min_free_kbytes, thereby meeting our
> >> requirements for transparent hugepages in the 64K scenario, and it will
> >> not excessively reduce the available memory.
> > Thanks for debugging this. I agree 11GB for min_free_kbytes is too
> > much. But a kernel parameter sounds overkilling to me either. IMHO we
> > just need to have a better scaling for bigger base page size. For
> > example, we just keep one or two pageblock for min_free_kbytes when
> > the base page size is bigger than 4K.
> >
> Thank you very much for your advice, but how do we determine the number
> of pageblocks?

TBH, I can't tell. I don't have a magic number...

> This is a difficult number to determine. When PAGESIZE is 64K, arm64
> supports hugepages
> of 2M, 512M, and 16G, which can meet the requirements of scenarios that
> require hugepages.
>
> However, transparent huge pages can only support 512M, and 512M is a
> very large number, so
> enabling transparent huge pages should be carefully considered, not to
> mention whether it makes
> sense to reserve such a large amount of memory.
>
> Therefore, I think that in the scenario of 64K PAGESIZE, it might also
> be a good choice to directly
> cancel set_recommended_min_free_kbytes?

It should be ok too. There shouldn't be too many real life workloads
running with this configuration other than some Java users may run
some workloads with 64K base page + THP AFAIK. But it should be not
very common.

>
> Thanks
>
>
> >> Signed-off-by: Liu Song <liusong@xxxxxxxxxxxxxxxxx>
> >> ---
> >> .../admin-guide/kernel-parameters.txt | 5 +++++
> >> mm/khugepaged.c | 20 ++++++++++++++++++-
> >> 2 files changed, 24 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> >> index 654d0d921101..612bdf601cce 100644
> >> --- a/Documentation/admin-guide/kernel-parameters.txt
> >> +++ b/Documentation/admin-guide/kernel-parameters.txt
> >> @@ -6553,6 +6553,11 @@
> >> See Documentation/admin-guide/mm/transhuge.rst
> >> for more details.
> >>
> >> + transparent_hugepage_recommend_disable
> >> + [KNL,THP]
> >> + Can be used to disable transparent hugepage to actively modify
> >> + /proc/sys/vm/min_free_kbytes during enablement process.
> >> +
> >> trusted.source= [KEYS]
> >> Format: <string>
> >> This parameter identifies the trust source as a backend
> >> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> >> index 78fc1a24a1cc..ac40c618f4f6 100644
> >> --- a/mm/khugepaged.c
> >> +++ b/mm/khugepaged.c
> >> @@ -88,6 +88,9 @@ static unsigned int khugepaged_max_ptes_none __read_mostly;
> >> static unsigned int khugepaged_max_ptes_swap __read_mostly;
> >> static unsigned int khugepaged_max_ptes_shared __read_mostly;
> >>
> >> +/* default enable recommended */
> >> +static unsigned int transparent_hugepage_recommend __read_mostly = 1;
> >> +
> >> #define MM_SLOTS_HASH_BITS 10
> >> static DEFINE_READ_MOSTLY_HASHTABLE(mm_slots_hash, MM_SLOTS_HASH_BITS);
> >>
> >> @@ -2561,6 +2564,11 @@ static void set_recommended_min_free_kbytes(void)
> >> goto update_wmarks;
> >> }
> >>
> >> + if (!transparent_hugepage_recommend) {
> >> + pr_info("do not allow to recommend modify min_free_kbytes\n");
> >> + return;
> >> + }
> >> +
> >> for_each_populated_zone(zone) {
> >> /*
> >> * We don't need to worry about fragmentation of
> >> @@ -2591,7 +2599,10 @@ static void set_recommended_min_free_kbytes(void)
> >>
> >> if (recommended_min > min_free_kbytes) {
> >> if (user_min_free_kbytes >= 0)
> >> - pr_info("raising min_free_kbytes from %d to %lu to help transparent hugepage allocations\n",
> >> + pr_info("raising user specified min_free_kbytes from %d to %lu to help transparent hugepage allocations\n",
> >> + min_free_kbytes, recommended_min);
> >> + else
> >> + pr_info("raising default min_free_kbytes from %d to %lu to help transparent hugepage allocations\n",
> >> min_free_kbytes, recommended_min);
> >>
> >> min_free_kbytes = recommended_min;
> >> @@ -2601,6 +2612,13 @@ static void set_recommended_min_free_kbytes(void)
> >> setup_per_zone_wmarks();
> >> }
> >>
> >> +static int __init setup_transparent_hugepage_recommend_disable(char *str)
> >> +{
> >> + transparent_hugepage_recommend = 0;
> >> + return 1;
> >> +}
> >> +__setup("transparent_hugepage_recommend_disable", setup_transparent_hugepage_recommend_disable);
> >> +
> >> int start_stop_khugepaged(void)
> >> {
> >> int err = 0;
> >> --
> >> 2.19.1.6.gb485710b
> >>
> >>