Re: [PATCH v8 11/12] mm/vmalloc: Hugepage vmalloc mappings

From: Nicholas Piggin
Date: Fri Dec 04 2020 - 03:13:57 EST


Excerpts from Edgecombe, Rick P's message of December 1, 2020 6:21 am:
> On Sun, 2020-11-29 at 01:25 +1000, Nicholas Piggin wrote:
>> Support huge page vmalloc mappings. Config option
>> HAVE_ARCH_HUGE_VMALLOC
>> enables support on architectures that define HAVE_ARCH_HUGE_VMAP and
>> supports PMD sized vmap mappings.
>>
>> vmalloc will attempt to allocate PMD-sized pages if allocating PMD
>> size
>> or larger, and fall back to small pages if that was unsuccessful.
>>
>> Allocations that do not use PAGE_KERNEL prot are not permitted to use
>> huge pages, because not all callers expect this (e.g., module
>> allocations vs strict module rwx).
>
> Several architectures (x86, arm64, others?) allocate modules initially
> with PAGE_KERNEL and so I think this test will not exclude module
> allocations in those cases.

Ah, thanks. I guess archs must additionally ensure that their
PAGE_KERNEL allocations are suitable for huge page mappings before
enabling the option.

If there is interest from those archs to support this, I have an
early (un-posted) patch that adds an explicit VM_HUGE flag that could
override the pessemistic arch default. It's not much trouble to add this
to the large system hash allocations. It's very out of date now but I
can at least give what I have to anyone doing an arch support that
wants it.

>
> [snip]
>
>> @@ -2400,6 +2453,7 @@ static inline void set_area_direct_map(const
>> struct vm_struct *area,
>> {
>> int i;
>>
>> + /* HUGE_VMALLOC passes small pages to set_direct_map */
>> for (i = 0; i < area->nr_pages; i++)
>> if (page_address(area->pages[i]))
>> set_direct_map(area->pages[i]);
>> @@ -2433,11 +2487,12 @@ static void vm_remove_mappings(struct
>> vm_struct *area, int deallocate_pages)
>> * map. Find the start and end range of the direct mappings to
>> make sure
>> * the vm_unmap_aliases() flush includes the direct map.
>> */
>> - for (i = 0; i < area->nr_pages; i++) {
>> + for (i = 0; i < area->nr_pages; i += 1U << area->page_order) {
>> unsigned long addr = (unsigned long)page_address(area-
>> >pages[i]);
>> if (addr) {
>> + unsigned long page_size = PAGE_SIZE << area-
>> >page_order;
>> start = min(addr, start);
>> - end = max(addr + PAGE_SIZE, end);
>> + end = max(addr + page_size, end);
>> flush_dmap = 1;
>> }
>> }
>
> The logic around this is a bit tangled. The reset of the direct map has
> to succeed, but if the set_direct_map_() functions require a split they
> could fail. For x86, set_memory_ro() calls on a vmalloc alias will
> mirror the page size and permission on the direct map and so the direct
> map will be broken to 4k pages if it's a RO vmalloc allocation.
>
> But after this, module vmalloc()'s could have large pages which would
> result in large RO pages on the direct map. Then it could possibly fail
> when trying to reset a 4k page out of a large RO direct map mapping.
>
> I think either module allocations need to be actually excluded from
> having large pages (seems like you might have seen other issues as
> well?), or another option could be to use the changes here:
> https://lore.kernel.org/lkml/20201125092208.12544-4-rppt@xxxxxxxxxx/
> to reset the direct map for a large page range at a time for large
> vmalloc pages.
>

Right, x86 would have to do something about that before enabling.
A VM_HUGE flag might be quick and easy but maybe other options are not
too difficult.

Thanks,
Nick