Re: [PATCH v5] hugetlbfs: Get unmapped area below TASK_UNMAPPED_BASE for hugetlbfs

From: Hushijie
Date: Sat May 16 2020 - 03:47:42 EST


>On 5/14/20 7:31 AM, Shijie Hu wrote:
>> Here is a final patch to solve that hugetlb_get_unmapped_area() can't
>> get unmapped area below mmap base for huge pages based on a few previous
>> discussions and patches from me.
>>
>> I'm so sorry. When sending v2 and v3 patches, I forget to cc:
>> linux-mm@xxxxxxxxx and linux-kernel@xxxxxxxxxxxxxxxx No records of these
>> two patches found on the https://lkml.org/lkml/.
>>
>> Patch V1: https://lkml.org/lkml/2020/4/27/440
>> Patch V4: https://lkml.org/lkml/2020/5/13/980
>>
>> Changes in V2:
>> * Follow Mike's suggestions, move hugetlb_get_unmapped_area() routines
>> from "fs/hugetlbfs/inode.c" to "arch/arm64/mm/hugetlbpage.c", without
>> changing core code.
>> * Add mmap_is_legacy() function, and only fall back to the bottom-up
>> function on no-legacy mode.
>>
>> Changes in V3:
>> * Add *bottomup() and *topdown() two function, and check if
>> mm->get_unmapped_area is equal to arch_get_unmapped_area() instead of
>> checking mmap_is_legacy() to determine which function should be used.
>>
>> Changes in V4:
>> * Follow the suggestions of Will and Mike, move back this patch to core
>> code.
>>
>> Changes in V5:
>> * Fix kbuild test error.
>>
>> ------
>>
>> In a 32-bit program, running on arm64 architecture. When the address
>> space below mmap base is completely exhausted, shmat() for huge pages
>> will return ENOMEM, but shmat() for normal pages can still success on
>> no-legacy mode. This seems not fair.
>>
>> For normal pages, get_unmapped_area() calling flows are:
>> => mm->get_unmapped_area()
>> if on legacy mode,
>> => arch_get_unmapped_area()
>> => vm_unmapped_area()
>> if on no-legacy mode,
>> => arch_get_unmapped_area_topdown()
>> => vm_unmapped_area()
>>
>> For huge pages, get_unmapped_area() calling flows are:
>> => file->f_op->get_unmapped_area()
>> => hugetlb_get_unmapped_area()
>> => vm_unmapped_area()
>>
>> To solve this issue, we only need to make hugetlb_get_unmapped_area() take
>> the same way as mm->get_unmapped_area(). Add *bottomup() and *topdown()
>> two functions, and check current mm->get_unmapped_area() to decide which
>> one to use. If mm->get_unmapped_area is equal to arch_get_unmapped_area(),
>> hugetlb_get_unmapped_area() calls bottomup routine, otherwise calls topdown
>> routine.
>>
>> Signed-off-by: Shijie Hu <hushijie3@xxxxxxxxxx>
>> Reported-by: kbuild test robot <lkp@xxxxxxxxx>
>> ---
>> fs/hugetlbfs/inode.c | 62 +++++++++++++++++++++++++++++++++++++++++++++-------
>> 1 file changed, 54 insertions(+), 8 deletions(-)
>>
>> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
>> index 991c60c7ffe0..61418380f492 100644
>> --- a/fs/hugetlbfs/inode.c
>> +++ b/fs/hugetlbfs/inode.c
>> @@ -38,6 +38,7 @@
>> #include <linux/uio.h>
>>
>> #include <linux/uaccess.h>
>> +#include <linux/sched/mm.h>
>>
>> static const struct super_operations hugetlbfs_ops;
>> static const struct address_space_operations hugetlbfs_aops;
>> @@ -191,13 +192,60 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
>>
>> #ifndef HAVE_ARCH_HUGETLB_UNMAPPED_AREA
>> static unsigned long
>> +hugetlb_get_unmapped_area_bottomup(struct file *file, unsigned long addr,
>> + unsigned long len, unsigned long pgoff, unsigned long flags)
>> +{
>> + struct hstate *h = hstate_file(file);
>> + struct vm_unmapped_area_info info;
>> +
>> + info.flags = 0;
>> + info.length = len;
>> + info.low_limit = current->mm->mmap_base;
>> + info.high_limit = TASK_SIZE;
>> + info.align_mask = PAGE_MASK & ~huge_page_mask(h);
>> + info.align_offset = 0;
>> + return vm_unmapped_area(&info);
>> +}
>> +
>> +static unsigned long
>> +hugetlb_get_unmapped_area_topdown(struct file *file, unsigned long addr,
>> + unsigned long len, unsigned long pgoff, unsigned long flags)
>> +{
>> + struct hstate *h = hstate_file(file);
>> + struct vm_unmapped_area_info info;
>> +
>> + info.flags = VM_UNMAPPED_AREA_TOPDOWN;
>> + info.length = len;
>> + info.low_limit = max(PAGE_SIZE, mmap_min_addr);
>> + info.high_limit = current->mm->mmap_base;
>> + info.align_mask = PAGE_MASK & ~huge_page_mask(h);
>> + info.align_offset = 0;
>> + addr = vm_unmapped_area(&info);
>> +
>> + /*
>> + * A failed mmap() very likely causes application failure,
>> + * so fall back to the bottom-up function here. This scenario
>> + * can happen with large stack limits and large mmap()
>> + * allocations.
>> + */
>> + if (unlikely(offset_in_page(addr))) {
>> + VM_BUG_ON(addr != -ENOMEM);
>> + info.flags = 0;
>> + info.low_limit = current->mm->mmap_base;
>> + info.high_limit = TASK_SIZE;
>> + addr = vm_unmapped_area(&info);
>> + }
>> +
>> + return addr;
>> +}
>> +
>> +static unsigned long
>> hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
>> unsigned long len, unsigned long pgoff, unsigned long flags)
>> {
>> struct mm_struct *mm = current->mm;
>> struct vm_area_struct *vma;
>> struct hstate *h = hstate_file(file);
>> - struct vm_unmapped_area_info info;
>>
>> if (len & ~huge_page_mask(h))
>> return -EINVAL;
>> @@ -218,13 +266,11 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
>> return addr;
>> }
>>
>> - info.flags = 0;
>> - info.length = len;
>> - info.low_limit = TASK_UNMAPPED_BASE;
>> - info.high_limit = TASK_SIZE;
>> - info.align_mask = PAGE_MASK & ~huge_page_mask(h);
>> - info.align_offset = 0;
>> - return vm_unmapped_area(&info);
>> + if (mm->get_unmapped_area == arch_get_unmapped_area)
>> + return hugetlb_get_unmapped_area_bottomup(file, addr, len,
>> + pgoff, flags);
>> + return hugetlb_get_unmapped_area_topdown(file, addr, len,
>> + pgoff, flags);
>
>I like this code using the value of mm->get_unmapped_area to determine
>which routine to call. It is used by a few architectures. However, I
>noticed that on at least one architecture (powerpc) mm->get_unmapped_area
>may be assigned to routines other than arch_get_unmapped_area or
>arch_get_unmapped_area_topdown. In such a case, we would call the 'new'
>topdown routine. I would prefer that we call the bottomup routine in this
>default case.
>
>In reality, this does not impact powerpc as that architecture has it's
>own hugetlb_get_unmapped_area routine.
>

Yes, I also noticed this before, powerpc uses radix__arch_get_unmapped_area*()
when CONFIG_PPC_RADIX_MMU opened as 'y' and radix_enabled() returns
true. However, powerpc implemented its own hugetlb_get_unmapped_area(). This
patch actually has no effect on powerpc.

>Because of this, I suggest we add a comment above this code and switch
>the if/else order. For example,
>
>+ /*
>+ * Use mm->get_unmapped_area value as a hint to use topdown routine.
>+ * If architectures have special needs, they should define their own
>+ * version of hugetlb_get_unmapped_area.
>+ */
>+ if (mm->get_unmapped_area == arch_get_unmapped_area_topdown)
>+ return hugetlb_get_unmapped_area_topdown(file, addr, len,
>+ pgoff, flags);
>+ return hugetlb_get_unmapped_area_bottomup(file, addr, len,
>+ pgoff, flags);
>
>Thoughts?
>--
>Mike Kravetz
>
I agree with you. It's clever to switch the if/else order. If there is such
a case, mm->get_unmapped_area() is neihter arch_get_unmapped_area() nor
arch_get_unmapped_area_topdown(), it is indeed more appropriate to make the
bottomup routine as the default behavior.

May I put this code and comment you show above into patch v6 and add
"Signed-off-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx>" to it?
--
Shijie Hu

>
>> }
>> #endif
>>
>>