Re: [PATCH v6 bpf-next 0/5] bpf_prog_pack followup

From: Luis Chamberlain
Date: Fri Jul 08 2022 - 18:24:46 EST


On Fri, Jul 08, 2022 at 07:58:44PM +0000, Song Liu wrote:
>
>
> > On Jul 8, 2022, at 8:58 AM, Luis Chamberlain <mcgrof@xxxxxxxxxx> wrote:
> >
> > On Fri, Jul 08, 2022 at 01:36:25AM +0000, Song Liu wrote:
> >>
> >>
> >>> On Jul 7, 2022, at 5:53 PM, Luis Chamberlain <mcgrof@xxxxxxxxxx> wrote:
> >>>
> >>> On Thu, Jul 07, 2022 at 11:52:58PM +0000, Song Liu wrote:
> >>>>> On Jul 7, 2022, at 3:59 PM, Luis Chamberlain <mcgrof@xxxxxxxxxx> wrote:
> >>>>>
> >>>>> On Thu, Jul 07, 2022 at 03:35:41PM -0700, Song Liu wrote:
> >>>>>> This set is the second half of v4 [1].
> >>>>>>
> >>>>>> Changes v5 => v6:
> >>>>>> 1. Rebase and extend CC list.
> >>>>>
> >>>>> Why post a new iteration so soon without completing the discussion we
> >>>>> had? It seems like we were at least going somewhere. If it's just
> >>>>> to include mm as I requested, sure, that's fine, but this does not
> >>>>> provide context as to what we last were talking about.
> >>>>
> >>>> Sorry for sending v6 too soon. The primary reason was to extend the CC
> >>>> list and add it back to patchwork (v5 somehow got archived).
> >>>>
> >>>> Also, I think vmalloc_exec_ work would be a separate project, while this
> >>>> set is the followup work of bpf_prog_pack. Does this make sense?
> >>>>
> >>>> Btw, vmalloc_exec_ work could be a good topic for LPC. It will be much
> >>>> more efficient to discuss this in person.
> >>>
> >>> What we need is input from mm / arch folks. What is not done here is
> >>> what that stuff we're talking about is and so mm folks can't guess. My
> >>> preference is to address that.
> >>>
> >>> I don't think in person discussion is needed if the only folks
> >>> discussing this topic so far is just you and me.
> >>
> >> How about we start a thread with mm / arch folks for the vmalloc_exec_*
> >> topic? I will summarize previous discussions and include pointers to
> >> these discussions. If necessary, we can continue the discussion at LPC.
> >
> > This sounds like a nice thread to use as this is why we are talking
> > about that topic.
> >
> >> OTOH, I guess the outcome of that discussion should not change this set?
> >
> > If the above is done right then actually I think it would show similar
> > considerations for a respective free for module_alloc_huge().
> >
> >> If we have concern about module_alloc_huge(), maybe we can have bpf code
> >> call vmalloc directly (until we have vmalloc_exec_)?
> >
> > You'd need to then still open code in a similar way the same things
> > which we are trying to reach consensus on.
>
> >> What do you think about this plan?
> >
> > I think we should strive to not be lazy and sloppy, and prevent growth
> > of sloppy code. So long as we do that I think this is all reasoanble.
>
> Let me try to understand your concerns here. Say if we want module code
> to be a temporary home for module_alloc_huge before we move it to mm
> code. Would you think it is ready to ship if we:

Please CC Christoph and linux-modules@xxxxxxxxxxxxxxx on future patches
and dicussions aroudn this, and all others now CC'd.

> 1) Rename module_alloc_huge as module_alloc_text_huge();

module_alloc_text_huge() is too long, but I've suggested names before
which are short and generic, and also suggested that if modules are
not the only users this needs to go outside of modules and so
vmalloc_text_huge() or whatever.

To do this right it begs the question why we don't do that for the
existing module_alloc(), as the users of this code is well outside of
modules now. Last time a similar generic name was used all the special
arch stuff was left to be done by the module code still, but still
non-modules were still using that allocator. From my perspective the
right thing to do is to deal with all the arch stuff as well in the
generic handler, and have the module code *and* the other users which
use module_alloc() to use that new caller as well.

> 2) Add module_free_text_huge();

Right, we have special handling for how we free this special code for regular
module_alloc() and so similar considerations would be needed here for
the huge stuff.

> 3) Move set_memory_* and fill_ill_insn logic into module_alloc_text_huge()
> and module_free_text_huge().

Yes, that's a bit hairy now, and so a saner and consistent way to do
this would be best.

> Are these on the right direction? Did I miss anything important?

I've also hinted before that another way to help here is to have draw
up a simple lib/test_vmalloc_text.c or something like that which would
enable a selftest to ensure correctness of this code on different archs
and maybe even let you do performance analysis using perf [0]. You have
good reasons to move to the huge allocator and the performance metrics
are an abstract load, however perf measurements can also give you real
raw data which you can reproduce and enable others to do similar
comparisons later.

The last thing I'd ask is just ensure you Cc folks who have already been in
these discussions.

[0] https://lkml.kernel.org/r/Yog+d+oR5TtPp2cs@xxxxxxxxxxxxxxxxxxxxxx

Luis