Re: [PATCH v4 1/4] mm: Optimize mprotect() for MM_CP_PROT_NUMA by batch-skipping PTEs
From: Lorenzo Stoakes
Date: Wed Jul 02 2025 - 11:43:32 EST
On Wed, Jul 02, 2025 at 08:31:33PM +0530, Dev Jain wrote:
>
> On 02/07/25 3:07 pm, Lorenzo Stoakes wrote:
> > On Sat, Jun 28, 2025 at 05:04:32PM +0530, Dev Jain wrote:
> > > In case of prot_numa, there are various cases in which we can skip to the
> > > next iteration. Since the skip condition is based on the folio and not
> > > the PTEs, we can skip a PTE batch. Additionally refactor all of this
> > > into a new function to clean up the existing code.
> > >
> > > Signed-off-by: Dev Jain <dev.jain@xxxxxxx>
> > > ---
> > > mm/mprotect.c | 134 ++++++++++++++++++++++++++++++++------------------
> > > 1 file changed, 87 insertions(+), 47 deletions(-)
> > >
> > > diff --git a/mm/mprotect.c b/mm/mprotect.c
> > > index 88709c01177b..af10a7fbe6b8 100644
> > > --- a/mm/mprotect.c
> > > +++ b/mm/mprotect.c
> > > @@ -83,6 +83,83 @@ bool can_change_pte_writable(struct vm_area_struct *vma, unsigned long addr,
> > > return pte_dirty(pte);
> > > }
> > >
> > > +static int mprotect_folio_pte_batch(struct folio *folio, unsigned long addr,
> > > + pte_t *ptep, pte_t pte, int max_nr_ptes)
> > > +{
> > > + const fpb_t flags = FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY;
> > > +
> > > + if (!folio || !folio_test_large(folio) || (max_nr_ptes == 1))
> > > + return 1;
> > > +
> > > + return folio_pte_batch(folio, addr, ptep, pte, max_nr_ptes, flags,
> > > + NULL, NULL, NULL);
> > > +}
> > > +
> > > +static int prot_numa_skip_ptes(struct folio **foliop, struct vm_area_struct *vma,
> > > + unsigned long addr, pte_t oldpte, pte_t *pte, int target_node,
> > > + int max_nr_ptes)
> > > +{
> > While it's nice to separate this out, it's not so nice to pass folio as a
> > pointer to a pointer like this and maybe or maybe not set it.
> >
> > Just get the folio before you call this... you'll need it either way.
>
> I did that on David's suggestion:
>
> https://lore.kernel.org/all/8c389ee5-f7a4-44f6-a0d6-cc01c3da4d91@xxxxxxxxxx/
>
> We were trying to reuse the folio if available from prot_numa_skip_ptes,
> to avoid using vm_normal_folio() again. Not sure if avoiding vm_normal_folio
> is worth the complexity.
Well, do you need to? You're doing vm_normal_folio() in both cases, why not just
put the vm_normal_folio() lookup in change_pte_range() before invoking this
function, then reuse the folio in the loop?
Oh right, I guess David was concerned about not needing to look it up in the
pte_protnone(oldpte) case?
I'm not sure that's worth it honestly. If we do _really_ want to do this, then
at least put the param last