Re: [PATCH v2 3/3] x86: Support huge vmalloc mappings

From: Kefeng Wang
Date: Wed Jan 19 2022 - 08:32:21 EST



On 2022/1/19 12:17, Nicholas Piggin wrote:
Excerpts from Dave Hansen's message of January 19, 2022 3:28 am:
On 1/17/22 6:46 PM, Nicholas Piggin wrote:
This all sounds very fragile to me. Every time a new architecture would
get added for huge vmalloc() support, the developer needs to know to go
find that architecture's module_alloc() and add this flag.
This is documented in the Kconfig.

#
# Archs that select this would be capable of PMD-sized vmaps (i.e.,
# arch_vmap_pmd_supported() returns true), and they must make no assumptions
# that vmalloc memory is mapped with PAGE_SIZE ptes. The VM_NO_HUGE_VMAP flag
# can be used to prohibit arch-specific allocations from using hugepages to
# help with this (e.g., modules may require it).
#
config HAVE_ARCH_HUGE_VMALLOC
depends on HAVE_ARCH_HUGE_VMAP
bool

Is it really fair to say it's *very* fragile? Surely it's reasonable to
read the (not very long) documentation ad understand the consequences for
the arch code before enabling it.
Very fragile or not, I think folks are likely to get it wrong. It would
be nice to have it default *everyone* to safe and slow and make *sure*
It's not safe to enable though. That's the problem. If it was just
modules then you'd have a point but it could be anything.

they go look at the architecture modules code itself before enabling
this for modules.
This is required not just for modules for the whole arch code, it
has to be looked at and decided this will work.

Just from that Kconfig text, I don't think I'd know off the top of my
head what do do for x86, or what code I needed to go touch.
You have to make sure arch/x86 makes no assumptions that vmalloc memory
is backed by PAGE_SIZE ptes. If you can't do that then you shouldn't
enable the option. The option can not explain it any more because any
arch could do anything with its mappings. The module code is an example,
not the recipe.

Hi Nick, Dave and Christophe,thanks for your review,  a little confused,   I think,

1) for ppc/arm64 module_alloc(),  it must set VM_NO_HUGE_VMAP because the

arch's set_memory_* funcitons can only support PAGE_SIZE mapping, due to the

limit of apply_to_page_range().

2) but for x86's module_alloc(), add VM_NO_HUGE_VMAP is to avoid fragmentation,

x86's __change_page_attr functions will split the huge mapping. this flags is not a must.


and the behavior above occurred when STRICT_MODULE_RWX enabled, so

1) add a unified function to set vm flags(suggested by Dave ) or

2) add vm flags with some comments to per-arch's module_alloc()

are both acceptable, for the way of unified function ,  we could make this a default recipe

with STRICT_MODULE_RWX, also make two more vm flags into it, eg,

+unsigned long module_alloc_vm_flags(bool need_flush_reset_perms)
+{
+       unsigned long vm_flags = VM_DEFER_KMEMLEAK;
+
+       if (need_flush_reset_perms)
+               vm_flags |= VM_FLUSH_RESET_PERMS;
+       /*
+        * Modules use a single, large vmalloc(). Different permissions
+        * are applied later and will fragment huge mappings or even
+        * fails in set_memory_* on some architectures. Avoid using
+        * huge pages for modules.
+        */
+       if (IS_ENABLED(CONFIG_STRICT_MODULE_RWX))
+               vm_flags |= VM_NO_HUGE_VMAP;
+
+       return vm_flags;
+}

then called each arch's module_alloc().

Any suggestion, many thanks.



Thanks,
Nick
.