Re: [PATCH v2 3/3] x86: Support huge vmalloc mappings

From: Kefeng Wang
Date: Tue Dec 28 2021 - 05:26:46 EST



On 2021/12/27 23:56, Dave Hansen wrote:
On 12/27/21 6:59 AM, Kefeng Wang wrote:
This patch select HAVE_ARCH_HUGE_VMALLOC to let X86_64 and X86_PAE
support huge vmalloc mappings.
In general, this seems interesting and the diff is simple. But, I don't
see _any_ x86-specific data. I think the bare minimum here would be a
few kernel compiles and some 'perf stat' data for some TLB events.

When the feature supported on ppc,

commit 8abddd968a303db75e4debe77a3df484164f1f33
Author: Nicholas Piggin <npiggin@xxxxxxxxx>
Date:   Mon May 3 19:17:55 2021 +1000

    powerpc/64s/radix: Enable huge vmalloc mappings

    This reduces TLB misses by nearly 30x on a `git diff` workload on a
    2-node POWER9 (59,800 -> 2,100) and reduces CPU cycles by 0.54%, due
    to vfs hashes being allocated with 2MB pages.

But the data could be different on different machine/arch.

diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index 95fa745e310a..6bf5cb7d876a 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -75,8 +75,8 @@ void *module_alloc(unsigned long size)
p = __vmalloc_node_range(size, MODULE_ALIGN,
MODULES_VADDR + get_module_load_offset(),
- MODULES_END, gfp_mask,
- PAGE_KERNEL, VM_DEFER_KMEMLEAK, NUMA_NO_NODE,
+ MODULES_END, gfp_mask, PAGE_KERNEL,
+ VM_DEFER_KMEMLEAK | VM_NO_HUGE_VMAP, NUMA_NO_NODE,
__builtin_return_address(0));
if (p && (kasan_module_alloc(p, size, gfp_mask) < 0)) {
vfree(p);
To figure out what's going on in this hunk, I had to look at the cover
letter (which I wasn't cc'd on). That's not great and it means that
somebody who stumbles upon this in the code is going to have a really
hard time figuring out what is going on. Cover letters don't make it
into git history.
Sorry for that, will add more into arch's patch changelog.
This desperately needs a comment and some changelog material in *this*
patch.

But, even the description from the cover letter is sparse:

There are some disadvantages about this feature[2], one of the main
concerns is the possible memory fragmentation/waste in some scenarios,
also archs must ensure that any arch specific vmalloc allocations that
require PAGE_SIZE mappings(eg, module alloc with STRICT_MODULE_RWX)
use the VM_NO_HUGE_VMAP flag to inhibit larger mappings.
That just says that x86 *needs* PAGE_SIZE allocations. But, what
happens if VM_NO_HUGE_VMAP is not passed (like it was in v1)? Will the
subsequent permission changes just fragment the 2M mapping?
.

Yes, without VM_NO_HUGE_VMAP, it could fragment the 2M mapping.

When module alloc with STRICT_MODULE_RWX on x86, it calls __change_page_attr()

from set_memory_ro/rw/nx which will split large page, so there is no need to make

module alloc with HUGE_VMALLOC.