Re: [PATCH v9 4/4] mm: hugetlb_vmemmap: add hugetlb_optimize_vmemmap sysctl

From: Mike Kravetz
Date: Fri May 06 2022 - 12:51:30 EST


On 5/5/22 19:49, Muchun Song wrote:
> On Thu, May 05, 2022 at 09:48:34AM -0700, Mike Kravetz wrote:
>> On 5/5/22 01:02, Muchun Song wrote:
>>> On Wed, May 04, 2022 at 08:36:00PM -0700, Mike Kravetz wrote:
>>>> On 5/4/22 19:35, Muchun Song wrote:
>>>>> On Wed, May 04, 2022 at 03:12:39PM -0700, Mike Kravetz wrote:
>>>>>> On 4/29/22 05:18, Muchun Song wrote:
>>>>>>> +static void vmemmap_optimize_mode_switch(enum vmemmap_optimize_mode to)
>>>>>>> +{
>>>>>>> + if (vmemmap_optimize_mode == to)
>>>>>>> + return;
>>>>>>> +
>>>>>>> + if (to == VMEMMAP_OPTIMIZE_OFF)
>>>>>>> + static_branch_dec(&hugetlb_optimize_vmemmap_key);
>>>>>>> + else
>>>>>>> + static_branch_inc(&hugetlb_optimize_vmemmap_key);
>>>>>>> + vmemmap_optimize_mode = to;
>>>>>>> +}
>>>>>>> +
>>>>>>> static int __init hugetlb_vmemmap_early_param(char *buf)
>>>>>>> {
>>>>>>> bool enable;
>>>>>>> + enum vmemmap_optimize_mode mode;
>>>>>>>
>>>>>>> if (kstrtobool(buf, &enable))
>>>>>>> return -EINVAL;
>>>>>>>
>>>>>>> - if (enable)
>>>>>>> - static_branch_enable(&hugetlb_optimize_vmemmap_key);
>>>>>>> - else
>>>>>>> - static_branch_disable(&hugetlb_optimize_vmemmap_key);
>>>>>>> + mode = enable ? VMEMMAP_OPTIMIZE_ON : VMEMMAP_OPTIMIZE_OFF;
>>>>>>> + vmemmap_optimize_mode_switch(mode);
>>>>>>>
>>>>>>> return 0;
>>>>>>> }
>>>>>>> @@ -60,6 +80,8 @@ int hugetlb_vmemmap_alloc(struct hstate *h, struct page *head)
>>>>>>> vmemmap_end = vmemmap_addr + (vmemmap_pages << PAGE_SHIFT);
>>>>>>> vmemmap_reuse = vmemmap_addr - PAGE_SIZE;
>>>>>>>
>>>>>>> + VM_BUG_ON_PAGE(!vmemmap_pages, head);
>>>>>>> +
>>>>>>> /*
>>>>>>> * The pages which the vmemmap virtual address range [@vmemmap_addr,
>>>>>>> * @vmemmap_end) are mapped to are freed to the buddy allocator, and
>>>>>>> @@ -69,8 +91,10 @@ int hugetlb_vmemmap_alloc(struct hstate *h, struct page *head)
>>>>>>> */
>>>>>>> ret = vmemmap_remap_alloc(vmemmap_addr, vmemmap_end, vmemmap_reuse,
>>>>>>> GFP_KERNEL | __GFP_NORETRY | __GFP_THISNODE);
>>>>>>> - if (!ret)
>>>>>>> + if (!ret) {
>>>>>>> ClearHPageVmemmapOptimized(head);
>>>>>>> + static_branch_dec(&hugetlb_optimize_vmemmap_key);
>>>>>>> + }
>>>>>>>
>>>>>>> return ret;
>>>>>>> }
>>>>>>> @@ -84,6 +108,8 @@ void hugetlb_vmemmap_free(struct hstate *h, struct page *head)
>>>>>>> if (!vmemmap_pages)
>>>>>>> return;
>>>>>>>
>>>>>>> + static_branch_inc(&hugetlb_optimize_vmemmap_key);
>>>>>>
>>>>>> Can you explain the reasoning behind doing the static_branch_inc here in free,
>>>>>> and static_branch_dec in alloc?
>>>>>> IIUC, they may not be absolutely necessary but you could use the count to
>>>>>> know how many optimized pages are in use? Or, I may just be missing
>>>>>> something.
>>>>>>
>>>>>
>>>>> Partly right. One 'count' is not enough. I have implemented this with similar
>>>>> approach in v6 [1]. Except the 'count', we also need a lock to do synchronization.
>>>>> However, both count and synchronization are included in static_key_inc/dec
>>>>> infrastructure. It is simpler to use static_key_inc/dec directly, right?
>>>>>
>>>>> [1] https://lore.kernel.org/all/20220330153745.20465-5-songmuchun@xxxxxxxxxxxxx/
>>>>>
>>>>
>>>> Sorry, but I am a little confused.
>>>>
>>>> vmemmap_optimize_mode_switch will static_key_inc to enable and static_key_dec
>>>> to disable. In addition each time we optimize (allocate) a hugetlb page after
>>>> enabling we will static_key_inc.
>>>>
>>>> Suppose we have 1 hugetlb page optimized. So static count == 2 IIUC.
>>>> The someone turns off optimization via sysctl. static count == 1 ???
>>>
>>> Definitely right.
>>>
>>>> If we then add another hugetlb page via nr_hugepages it seems that it
>>>> would be optimized as static count == 1. Is that correct? Do we need
>>>
>>> I'm wrong.
>>>
>>>> to free all hugetlb pages with optimization before we can add new pages
>>>> without optimization?
>>>>
>>>
>>> My bad. I think the following code would fix this.
>>>
>>> Thanks for your review carefully.
>>>
>>> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
>>> index 5820a681a724..997e192aeed7 100644
>>> --- a/mm/hugetlb_vmemmap.c
>>> +++ b/mm/hugetlb_vmemmap.c
>>> @@ -105,7 +105,7 @@ void hugetlb_vmemmap_free(struct hstate *h, struct page *head)
>>> unsigned long vmemmap_end, vmemmap_reuse, vmemmap_pages;
>>>
>>> vmemmap_pages = hugetlb_optimize_vmemmap_pages(h);
>>> - if (!vmemmap_pages)
>>> + if (!vmemmap_pages || READ_ONCE(vmemmap_optimize_mode) == VMEMMAP_OPTIMIZE_OFF)
>>> return;
>>>
>>> static_branch_inc(&hugetlb_optimize_vmemmap_key);
>>>
>>
>> If vmemmap_optimize_mode == VMEMMAP_OPTIMIZE_OFF is sufficient for turning
>> off optimizations, do we really need to static_branch_inc/dev for each
>> hugetlb page?
>>
>
> static_branch_inc/dec is necessary since the user could change
> vmemmap_optimize_mode to off after the 'if' judgement.
>
> CPU0: CPU1:
> // Assume vmemmap_optimize_mode == 1
> // and static_key_count == 1
> if (vmemmap_optimize_mode == VMEMMAP_OPTIMIZE_OFF)
> return;
> hugetlb_optimize_vmemmap_handler();
> vmemmap_optimize_mode = 0;
> static_branch_dec();
> // static_key_count == 0
> // Enable static_key if necessary
> static_branch_inc();
>
> Does this make sense for you?

Yes, it makes sense and is require because hugetlb_optimize_vmemmap_pages()
performs two functions:
1) It determines if vmemmap_optimization is enabled
2) It specifies how many vmemmap pages can be saved with optimization
hugetlb_optimize_vmemmap_pages returns 0 if static_key_count == 0, so this
would cause problems in places such as hugetlb free path (hugetlb_vmemmap_alloc). I hope my understanding is correct?

Would it make the code more clear if we did not do the check for
vmemmap_optimization in hugetlb_optimize_vmemmap_pages()? Instead:
- hugetlb_optimize_vmemmap_pages ALWAYS returns the number of vmemmap pages
that can be freed/optimized
- At hugetlb allocation time (hugetlb_vmemmap_free) we only check
hugetlb_optimize_vmemmap_enabled() to determine if optimization should
be performed.
- After hugetlb_vmemmap_free, we can use HPageVmemmapOptimized to determine
if vmemap pages need to be allocated in hugetlb freeing paths.

Perhaps, there is something wrong with the above suggestion?

I know you have always had hugetlb_optimize_vmemmap_pages perform the two
functions. So, splitting functionality may not be more clear for you. I am
OK leaving code as is (key inc/dec for each page). Just wanted to get your
(and perhaps other) thoughts on splitting functionality as described above.
--
Mike Kravetz