Re: [PATCH v2 bpf 3/3] bpf: use vmalloc with VM_ALLOW_HUGE_VMAP for bpf_prog_pack

From: Edgecombe, Rick P
Date: Tue Apr 12 2022 - 13:24:16 EST


On Mon, 2022-04-11 at 16:35 -0700, Song Liu wrote:
> @@ -889,7 +889,6 @@ static struct bpf_prog_pack *alloc_new_pack(void)
> bitmap_zero(pack->bitmap, bpf_prog_pack_size /
> BPF_PROG_CHUNK_SIZE);
> list_add_tail(&pack->list, &pack_list);
>
> - set_vm_flush_reset_perms(pack->ptr);
> set_memory_ro((unsigned long)pack->ptr, bpf_prog_pack_size /
> PAGE_SIZE);
> set_memory_x((unsigned long)pack->ptr, bpf_prog_pack_size /
> PAGE_SIZE);
> return pack;

Dropping set_vm_flush_reset_perms() is not mentioned in the commit log.
It is kind of a fix for a different issue.

Now that x86 supports vmalloc huge pages, but VM_FLUSH_RESET_PERMS does
not work with them, we should have some comments or warnings to that
effect somewhere. Someone may try to pass the flags in together.

> @@ -970,7 +969,9 @@ static void bpf_prog_pack_free(struct
> bpf_binary_header *hdr)
> if (bitmap_find_next_zero_area(pack->bitmap,
> bpf_prog_chunk_count(), 0,
> bpf_prog_chunk_count(), 0) ==
> 0) {
> list_del(&pack->list);
> - module_memfree(pack->ptr);


> + set_memory_nx((unsigned long)pack->ptr,
> bpf_prog_pack_size / PAGE_SIZE);
> + set_memory_rw((unsigned long)pack->ptr,
> bpf_prog_pack_size / PAGE_SIZE);
> + vfree(pack->ptr);
> kfree(pack);

Now that it calls module_alloc_huge() instead of vmalloc_node_range(),
should it call module_memfree() instead of vfree()?



Since there are bugs, simple, immediate fixes seem like the right thing
to do, but I had a couple long term focused comments on this new
feature:

It would be nice if bpf and the other module_alloc() callers could
share the same large pages. Meaning, ultimately that this whole thing
should probably live outside of bpf. BPF tracing usages might benefit
for example, and kprobes and ftrace are not too different than bpf
progs from a text allocation perspective.

I agree that the module's part is non-trivial. A while back I had tried
to do something like bpf_prog_pack() that worked for all the
module_alloc() callers. It had some modules changes to allow different
permissions to go to different allocations so they could be made to
share large pages:

https://lore.kernel.org/lkml/20201120202426.18009-1-rick.p.edgecombe@xxxxxxxxx/

I thought the existing kernel special permission allocation methods
were just too brittle and intertwined to improve without a new
interface. The hope was the new interface could wrap all the arch
intricacies instead of leaving them exposed in the cross-arch callers.

I wonder what you think of that general direction or if you have any
follow up plans for this?