Re: [PATCH bpf-next 1/3] mm/vmalloc: introduce vmalloc_exec which allocates RO+X memory

From: Peter Zijlstra
Date: Thu Jul 14 2022 - 03:27:06 EST


On Wed, Jul 13, 2022 at 10:16:36PM -0700, Christoph Hellwig wrote:
> On Wed, Jul 13, 2022 at 12:20:09PM +0200, Peter Zijlstra wrote:
> > Start by adding VM_TOPDOWN_VMAP, which instead of returning the lowest
> > (leftmost) vmap_area that fits, picks the higests (rightmost).
> >
> > Then add module_alloc_data() that uses VM_TOPDOWN_VMAP and make
> > ARCH_WANTS_MODULE_DATA_IN_VMALLOC use that instead of vmalloc (with a
> > weak function doing the vmalloc).
> >
> > This gets you bottom of module range is RO+X only, top is shattered
> > between different !X types.
> >
> > Then track the boundary between X and !X and ensure module_alloc_data()
> > and module_alloc() never cross over and stay strictly separated.
> >
> > Then change all module_alloc() users to expect RO+X memory, instead of
> > RW.
> >
> > Then make sure any extention of the X range is 2M aligned.
> >
> > And presto, *everybody* always uses 2M TLB for text, modules, bpf,
> > ftrace, the lot and nobody is tracking chunks.
> >
> > Maybe migration can be eased by instead providing module_alloc_text()
> > and ARCH_WANTS_MODULE_ALLOC_TEXT.
>
> This all looks pretty sensible. How are we going to do the initial
> write to the executable memory, though?

With something like text_poke_memcpy(). I suppose that the proposed
ARCH_WANTS_MODULE_ALLOC_TEXT needs to imply availability of that too.

If the 4K copy thing ends up being a bottleneck we can easily extend
that to have a 2M option as well.