Re: [PATCH bpf-next 5/5] bpf: use module_alloc_huge for bpf_prog_pack

From: Song Liu
Date: Thu May 19 2022 - 14:25:49 EST




> On May 19, 2022, at 9:56 AM, Edgecombe, Rick P <rick.p.edgecombe@xxxxxxxxx> wrote:
>
> On Thu, 2022-05-19 at 06:42 +0000, Song Liu wrote:
>> Thinking more on this. Even huge page is not supported, we can
>> allocate
>> 2MB worth of 4kB pages and keep using it. This would help direct map
>> fragmentation. And the code would also be simpler.
>>
>> Rick, I guess this is inline with some of your ideas?
>
> Yea, that is what I wondering. Potential benefits are just speculative
> though. There is a memory overhead cost, so it's not free.

Yeah, I had the same concern with memory overhead. The benefit should
not exceed 0.2% (when 2MB page is supported), so I think we can use
4kB pages for now.

>
> As for the other question of whether to fix VM_FLUSH_RESET_PERMS. If
> there really is an intention to create a more general module_alloc()
> replacement soon, then I think it is ok to side step it. An optimal
> replacement might not need it and it could be removed in that case.
> Let's at least add a WARN about it not working with huge pages though.

IIUC, it will take some effort to let kernel modules use a solution
like this. But there are some low hanging fruits, like ftrace and BPF
trampolines. Maybe that's enough to promote a more general solution.

For the WARN, I guess we need something like this?

diff --git i/include/linux/vmalloc.h w/include/linux/vmalloc.h
index b159c2789961..5e0d0a60d9d5 100644
--- i/include/linux/vmalloc.h
+++ w/include/linux/vmalloc.h
@@ -238,6 +238,7 @@ static inline void set_vm_flush_reset_perms(void *addr)
{
struct vm_struct *vm = find_vm_area(addr);

+ WARN_ON_ONCE(is_vm_area_hugepages(addr));
if (vm)
vm->flags |= VM_FLUSH_RESET_PERMS;
}


Thanks,
Song

>
> I also think the benchmarking so far is not sufficient to make the case
> that huge page mappings help your workload since the direct map splits
> were also different between the tests. I was expecting it to help
> though. Others were the ones that asked for that, so just commenting my
> analysis here.