Re: [PATCH v9 4/4] mm: hugetlb_vmemmap: add hugetlb_optimize_vmemmap sysctl

From: Mike Kravetz
Date: Thu May 05 2022 - 12:49:17 EST


On 5/5/22 01:02, Muchun Song wrote:
> On Wed, May 04, 2022 at 08:36:00PM -0700, Mike Kravetz wrote:
>> On 5/4/22 19:35, Muchun Song wrote:
>>> On Wed, May 04, 2022 at 03:12:39PM -0700, Mike Kravetz wrote:
>>>> On 4/29/22 05:18, Muchun Song wrote:
>>>>> +static void vmemmap_optimize_mode_switch(enum vmemmap_optimize_mode to)
>>>>> +{
>>>>> + if (vmemmap_optimize_mode == to)
>>>>> + return;
>>>>> +
>>>>> + if (to == VMEMMAP_OPTIMIZE_OFF)
>>>>> + static_branch_dec(&hugetlb_optimize_vmemmap_key);
>>>>> + else
>>>>> + static_branch_inc(&hugetlb_optimize_vmemmap_key);
>>>>> + vmemmap_optimize_mode = to;
>>>>> +}
>>>>> +
>>>>> static int __init hugetlb_vmemmap_early_param(char *buf)
>>>>> {
>>>>> bool enable;
>>>>> + enum vmemmap_optimize_mode mode;
>>>>>
>>>>> if (kstrtobool(buf, &enable))
>>>>> return -EINVAL;
>>>>>
>>>>> - if (enable)
>>>>> - static_branch_enable(&hugetlb_optimize_vmemmap_key);
>>>>> - else
>>>>> - static_branch_disable(&hugetlb_optimize_vmemmap_key);
>>>>> + mode = enable ? VMEMMAP_OPTIMIZE_ON : VMEMMAP_OPTIMIZE_OFF;
>>>>> + vmemmap_optimize_mode_switch(mode);
>>>>>
>>>>> return 0;
>>>>> }
>>>>> @@ -60,6 +80,8 @@ int hugetlb_vmemmap_alloc(struct hstate *h, struct page *head)
>>>>> vmemmap_end = vmemmap_addr + (vmemmap_pages << PAGE_SHIFT);
>>>>> vmemmap_reuse = vmemmap_addr - PAGE_SIZE;
>>>>>
>>>>> + VM_BUG_ON_PAGE(!vmemmap_pages, head);
>>>>> +
>>>>> /*
>>>>> * The pages which the vmemmap virtual address range [@vmemmap_addr,
>>>>> * @vmemmap_end) are mapped to are freed to the buddy allocator, and
>>>>> @@ -69,8 +91,10 @@ int hugetlb_vmemmap_alloc(struct hstate *h, struct page *head)
>>>>> */
>>>>> ret = vmemmap_remap_alloc(vmemmap_addr, vmemmap_end, vmemmap_reuse,
>>>>> GFP_KERNEL | __GFP_NORETRY | __GFP_THISNODE);
>>>>> - if (!ret)
>>>>> + if (!ret) {
>>>>> ClearHPageVmemmapOptimized(head);
>>>>> + static_branch_dec(&hugetlb_optimize_vmemmap_key);
>>>>> + }
>>>>>
>>>>> return ret;
>>>>> }
>>>>> @@ -84,6 +108,8 @@ void hugetlb_vmemmap_free(struct hstate *h, struct page *head)
>>>>> if (!vmemmap_pages)
>>>>> return;
>>>>>
>>>>> + static_branch_inc(&hugetlb_optimize_vmemmap_key);
>>>>
>>>> Can you explain the reasoning behind doing the static_branch_inc here in free,
>>>> and static_branch_dec in alloc?
>>>> IIUC, they may not be absolutely necessary but you could use the count to
>>>> know how many optimized pages are in use? Or, I may just be missing
>>>> something.
>>>>
>>>
>>> Partly right. One 'count' is not enough. I have implemented this with similar
>>> approach in v6 [1]. Except the 'count', we also need a lock to do synchronization.
>>> However, both count and synchronization are included in static_key_inc/dec
>>> infrastructure. It is simpler to use static_key_inc/dec directly, right?
>>>
>>> [1] https://lore.kernel.org/all/20220330153745.20465-5-songmuchun@xxxxxxxxxxxxx/
>>>
>>
>> Sorry, but I am a little confused.
>>
>> vmemmap_optimize_mode_switch will static_key_inc to enable and static_key_dec
>> to disable. In addition each time we optimize (allocate) a hugetlb page after
>> enabling we will static_key_inc.
>>
>> Suppose we have 1 hugetlb page optimized. So static count == 2 IIUC.
>> The someone turns off optimization via sysctl. static count == 1 ???
>
> Definitely right.
>
>> If we then add another hugetlb page via nr_hugepages it seems that it
>> would be optimized as static count == 1. Is that correct? Do we need
>
> I'm wrong.
>
>> to free all hugetlb pages with optimization before we can add new pages
>> without optimization?
>>
>
> My bad. I think the following code would fix this.
>
> Thanks for your review carefully.
>
> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> index 5820a681a724..997e192aeed7 100644
> --- a/mm/hugetlb_vmemmap.c
> +++ b/mm/hugetlb_vmemmap.c
> @@ -105,7 +105,7 @@ void hugetlb_vmemmap_free(struct hstate *h, struct page *head)
> unsigned long vmemmap_end, vmemmap_reuse, vmemmap_pages;
>
> vmemmap_pages = hugetlb_optimize_vmemmap_pages(h);
> - if (!vmemmap_pages)
> + if (!vmemmap_pages || READ_ONCE(vmemmap_optimize_mode) == VMEMMAP_OPTIMIZE_OFF)
> return;
>
> static_branch_inc(&hugetlb_optimize_vmemmap_key);
>

If vmemmap_optimize_mode == VMEMMAP_OPTIMIZE_OFF is sufficient for turning
off optimizations, do we really need to static_branch_inc/dev for each
hugetlb page?

--
Mike Kravetz