Re: kernel BUG at mm/hugetlb.c:LINE!

From: Mike Kravetz
Date: Wed May 20 2020 - 13:27:40 EST


On 5/20/20 4:20 AM, Miklos Szeredi wrote:
> On Tue, May 19, 2020 at 2:35 AM Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote:
>>
>> On 5/18/20 4:41 PM, Colin Walters wrote:
>>>
>>> On Tue, May 12, 2020, at 11:04 AM, Miklos Szeredi wrote:
>>>
>>>>> However, in this syzbot test case the 'file' is in an overlayfs filesystem
>>>>> created as follows:
>>>>>
>>>>> mkdir("./file0", 000) = 0
>>>>> mount(NULL, "./file0", "hugetlbfs", MS_MANDLOCK|MS_POSIXACL, NULL) = 0
>>>>> chdir("./file0") = 0
>>>>> mkdir("./file1", 000) = 0
>>>>> mkdir("./bus", 000) = 0
>>>>> mkdir("./file0", 000) = 0
>>>>> mount("\177ELF\2\1\1", "./bus", "overlay", 0, "lowerdir=./bus,workdir=./file1,u"...) = 0
>>>
>>> Is there any actual valid use case for mounting an overlayfs on top of hugetlbfs? I can't think of one. Why isn't the response to this to instead only allow mounting overlayfs on top of basically a set of whitelisted filesystems?
>>>
>>
>> I can not think of a use case. I'll let Miklos comment on adding whitelist
>> capability to overlayfs.
>
> I've not heard of overlayfs being used over hugetlbfs.
>
> Overlayfs on tmpfs is definitely used, I guess without hugepages.
> But if we'd want to allow tmpfs _without_ hugepages but not tmpfs
> _with_ hugepages, then we can't just whitelist based on filesystem
> type, but need to look at mount options as well. Which isn't really a
> clean solution either.
>
>> IMO - This BUG/report revealed two issues. First is the BUG by mmap'ing
>> a hugetlbfs file on overlayfs. The other is that core mmap code will skip
>> any filesystem specific get_unmapped area routine if on a union/overlay.
>> My patch fixes both, but if we go with a whitelist approach and don't allow
>> hugetlbfs I think we still need to address the filesystem specific
>> get_unmapped area issue. That is easy enough to do by adding a routine to
>> overlayfs which calls the routine for the underlying fs.
>
> I think the two are strongly related: get_unmapped_area() adjusts the
> address alignment, and the is_file_hugepages() call in
> ksys_mmap_pgoff() adjusts the length alignment.
>
> Is there any other purpose for which f_op->get_unmapped_area() is
> used by a filesystem?
>

I am fairly confident it is all about checking limits and alignment. The
filesystem knows if it can/should align to base or huge page size. DAX has
some interesting additional restrictions, and several 'traditional' filesystems
check if they are 'on DAX'.

In a previous e-mail, you suggested hugetlb_get_unmapped_area could do the
length adjustment in hugetlb_get_unmapped_area (generic and arch specific).
I agree, although there may be the need to add length overflow checks in
these routines (after round up) as this is done in core code now. However,
this can be done as a separate cleanup patch.

In any case, we need to get the core mmap code to call filesystem specific
get_unmapped_area if on a union/overlay. The patch I suggested does this
by simply calling real_file to determine if there is a filesystem specific
get_unmapped_area. The other approach would be to provide an overlayfs
get_unmapped_area that calls the underlying filesystem get_unmapped_area.

--
Mike Kravetz