Re: [PATCH v9 4/4] mm: hugetlb_vmemmap: add hugetlb_optimize_vmemmap sysctl

From: Muchun Song
Date: Sat May 07 2022 - 09:11:00 EST


On Fri, May 06, 2022 at 09:50:53AM -0700, Mike Kravetz wrote:
> On 5/5/22 19:49, Muchun Song wrote:
> > On Thu, May 05, 2022 at 09:48:34AM -0700, Mike Kravetz wrote:
> >> On 5/5/22 01:02, Muchun Song wrote:
> >>> On Wed, May 04, 2022 at 08:36:00PM -0700, Mike Kravetz wrote:
> >>>> On 5/4/22 19:35, Muchun Song wrote:
> >>>>> On Wed, May 04, 2022 at 03:12:39PM -0700, Mike Kravetz wrote:
> >>>>>> On 4/29/22 05:18, Muchun Song wrote:
> >>>>>>> +static void vmemmap_optimize_mode_switch(enum vmemmap_optimize_mode to)
> >>>>>>> +{
> >>>>>>> + if (vmemmap_optimize_mode == to)
> >>>>>>> + return;
> >>>>>>> +
> >>>>>>> + if (to == VMEMMAP_OPTIMIZE_OFF)
> >>>>>>> + static_branch_dec(&hugetlb_optimize_vmemmap_key);
> >>>>>>> + else
> >>>>>>> + static_branch_inc(&hugetlb_optimize_vmemmap_key);
> >>>>>>> + vmemmap_optimize_mode = to;
> >>>>>>> +}
> >>>>>>> +
> >>>>>>> static int __init hugetlb_vmemmap_early_param(char *buf)
> >>>>>>> {
> >>>>>>> bool enable;
> >>>>>>> + enum vmemmap_optimize_mode mode;
> >>>>>>>
> >>>>>>> if (kstrtobool(buf, &enable))
> >>>>>>> return -EINVAL;
> >>>>>>>
> >>>>>>> - if (enable)
> >>>>>>> - static_branch_enable(&hugetlb_optimize_vmemmap_key);
> >>>>>>> - else
> >>>>>>> - static_branch_disable(&hugetlb_optimize_vmemmap_key);
> >>>>>>> + mode = enable ? VMEMMAP_OPTIMIZE_ON : VMEMMAP_OPTIMIZE_OFF;
> >>>>>>> + vmemmap_optimize_mode_switch(mode);
> >>>>>>>
> >>>>>>> return 0;
> >>>>>>> }
> >>>>>>> @@ -60,6 +80,8 @@ int hugetlb_vmemmap_alloc(struct hstate *h, struct page *head)
> >>>>>>> vmemmap_end = vmemmap_addr + (vmemmap_pages << PAGE_SHIFT);
> >>>>>>> vmemmap_reuse = vmemmap_addr - PAGE_SIZE;
> >>>>>>>
> >>>>>>> + VM_BUG_ON_PAGE(!vmemmap_pages, head);
> >>>>>>> +
> >>>>>>> /*
> >>>>>>> * The pages which the vmemmap virtual address range [@vmemmap_addr,
> >>>>>>> * @vmemmap_end) are mapped to are freed to the buddy allocator, and
> >>>>>>> @@ -69,8 +91,10 @@ int hugetlb_vmemmap_alloc(struct hstate *h, struct page *head)
> >>>>>>> */
> >>>>>>> ret = vmemmap_remap_alloc(vmemmap_addr, vmemmap_end, vmemmap_reuse,
> >>>>>>> GFP_KERNEL | __GFP_NORETRY | __GFP_THISNODE);
> >>>>>>> - if (!ret)
> >>>>>>> + if (!ret) {
> >>>>>>> ClearHPageVmemmapOptimized(head);
> >>>>>>> + static_branch_dec(&hugetlb_optimize_vmemmap_key);
> >>>>>>> + }
> >>>>>>>
> >>>>>>> return ret;
> >>>>>>> }
> >>>>>>> @@ -84,6 +108,8 @@ void hugetlb_vmemmap_free(struct hstate *h, struct page *head)
> >>>>>>> if (!vmemmap_pages)
> >>>>>>> return;
> >>>>>>>
> >>>>>>> + static_branch_inc(&hugetlb_optimize_vmemmap_key);
> >>>>>>
> >>>>>> Can you explain the reasoning behind doing the static_branch_inc here in free,
> >>>>>> and static_branch_dec in alloc?
> >>>>>> IIUC, they may not be absolutely necessary but you could use the count to
> >>>>>> know how many optimized pages are in use? Or, I may just be missing
> >>>>>> something.
> >>>>>>
> >>>>>
> >>>>> Partly right. One 'count' is not enough. I have implemented this with similar
> >>>>> approach in v6 [1]. Except the 'count', we also need a lock to do synchronization.
> >>>>> However, both count and synchronization are included in static_key_inc/dec
> >>>>> infrastructure. It is simpler to use static_key_inc/dec directly, right?
> >>>>>
> >>>>> [1] https://lore.kernel.org/all/20220330153745.20465-5-songmuchun@xxxxxxxxxxxxx/
> >>>>>
> >>>>
> >>>> Sorry, but I am a little confused.
> >>>>
> >>>> vmemmap_optimize_mode_switch will static_key_inc to enable and static_key_dec
> >>>> to disable. In addition each time we optimize (allocate) a hugetlb page after
> >>>> enabling we will static_key_inc.
> >>>>
> >>>> Suppose we have 1 hugetlb page optimized. So static count == 2 IIUC.
> >>>> The someone turns off optimization via sysctl. static count == 1 ???
> >>>
> >>> Definitely right.
> >>>
> >>>> If we then add another hugetlb page via nr_hugepages it seems that it
> >>>> would be optimized as static count == 1. Is that correct? Do we need
> >>>
> >>> I'm wrong.
> >>>
> >>>> to free all hugetlb pages with optimization before we can add new pages
> >>>> without optimization?
> >>>>
> >>>
> >>> My bad. I think the following code would fix this.
> >>>
> >>> Thanks for your review carefully.
> >>>
> >>> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> >>> index 5820a681a724..997e192aeed7 100644
> >>> --- a/mm/hugetlb_vmemmap.c
> >>> +++ b/mm/hugetlb_vmemmap.c
> >>> @@ -105,7 +105,7 @@ void hugetlb_vmemmap_free(struct hstate *h, struct page *head)
> >>> unsigned long vmemmap_end, vmemmap_reuse, vmemmap_pages;
> >>>
> >>> vmemmap_pages = hugetlb_optimize_vmemmap_pages(h);
> >>> - if (!vmemmap_pages)
> >>> + if (!vmemmap_pages || READ_ONCE(vmemmap_optimize_mode) == VMEMMAP_OPTIMIZE_OFF)
> >>> return;
> >>>
> >>> static_branch_inc(&hugetlb_optimize_vmemmap_key);
> >>>
> >>
> >> If vmemmap_optimize_mode == VMEMMAP_OPTIMIZE_OFF is sufficient for turning
> >> off optimizations, do we really need to static_branch_inc/dev for each
> >> hugetlb page?
> >>
> >
> > static_branch_inc/dec is necessary since the user could change
> > vmemmap_optimize_mode to off after the 'if' judgement.
> >
> > CPU0: CPU1:
> > // Assume vmemmap_optimize_mode == 1
> > // and static_key_count == 1
> > if (vmemmap_optimize_mode == VMEMMAP_OPTIMIZE_OFF)
> > return;
> > hugetlb_optimize_vmemmap_handler();
> > vmemmap_optimize_mode = 0;
> > static_branch_dec();
> > // static_key_count == 0
> > // Enable static_key if necessary
> > static_branch_inc();
> >
> > Does this make sense for you?
>
> Yes, it makes sense and is require because hugetlb_optimize_vmemmap_pages()
> performs two functions:
> 1) It determines if vmemmap_optimization is enabled
> 2) It specifies how many vmemmap pages can be saved with optimization
> hugetlb_optimize_vmemmap_pages returns 0 if static_key_count == 0, so this
> would cause problems in places such as hugetlb free path (hugetlb_vmemmap_alloc). I hope my understanding is correct?
>

Right.

> Would it make the code more clear if we did not do the check for
> vmemmap_optimization in hugetlb_optimize_vmemmap_pages()? Instead:
> - hugetlb_optimize_vmemmap_pages ALWAYS returns the number of vmemmap pages
> that can be freed/optimized
> - At hugetlb allocation time (hugetlb_vmemmap_free) we only check
> hugetlb_optimize_vmemmap_enabled() to determine if optimization should
> be performed.
> - After hugetlb_vmemmap_free, we can use HPageVmemmapOptimized to determine
> if vmemap pages need to be allocated in hugetlb freeing paths.
>

I think this works as well. My initial consideration was that
embedding hugetlb_optimize_vmemmap_enabled() in
hugetlb_optimize_vmemmap_pages() could make the caller (e.g.
flush_free_hpage_work()) of hugetlb_optimize_vmemmap_pages()
more efficient when static_key == 0. Maybe I could add
the check for vmemmap_optimization to flush_free_hpage_work()
and then remove the check from hugetlb_optimize_vmemmap_pages().
Will do this in a new version.

Thanks.

> Perhaps, there is something wrong with the above suggestion?
>
> I know you have always had hugetlb_optimize_vmemmap_pages perform the two
> functions. So, splitting functionality may not be more clear for you. I am
> OK leaving code as is (key inc/dec for each page). Just wanted to get your
> (and perhaps other) thoughts on splitting functionality as described above.
> --
> Mike Kravetz
>