Re: [PATCH RFC v2] mm: use per_vma lock for MADV_DONTNEED

From: Jann Horn
Date: Mon Jun 02 2025 - 11:11:07 EST


On Sat, May 31, 2025 at 12:00 AM Barry Song <21cnbao@xxxxxxxxx> wrote:
> On Fri, May 30, 2025 at 10:07 PM Jann Horn <jannh@xxxxxxxxxx> wrote:
> > On Fri, May 30, 2025 at 12:44 PM Barry Song <21cnbao@xxxxxxxxx> wrote:
> > > @@ -1714,19 +1770,24 @@ static int madvise_do_behavior(struct mm_struct *mm,
> > > unsigned long start, size_t len_in,
> > > struct madvise_behavior *madv_behavior)
> > > {
> > > + struct vm_area_struct *vma = madv_behavior->vma;
> > > int behavior = madv_behavior->behavior;
> > > +
> > > struct blk_plug plug;
> > > unsigned long end;
> > > int error;
> > >
> > > if (is_memory_failure(behavior))
> > > return madvise_inject_error(behavior, start, start + len_in);
> > > - start = untagged_addr_remote(mm, start);
> > > + start = untagged_addr(start);
> >
> > Why is this okay? I see that X86's untagged_addr_remote() asserts that
> > the mmap lock is held, which is no longer the case here with your
> > patch, but untagged_addr() seems wrong here, since we can be operating
> > on another process. I think especially on X86 with 5-level paging and
> > LAM, there can probably be cases where address bits are used for part
> > of the virtual address in one task while they need to be masked off in
> > another task?
> >
> > I wonder if you'll have to refactor X86 and Risc-V first to make this
> > work... ideally by making sure that their address tagging state
> > updates are atomic and untagged_area_remote() works locklessly.
>
> If possible, can we try to avoid this at least for this stage? We all
> agree that
> a per-VMA lock for DONTNEED is long overdue. The main goal of the patch
> is to drop the mmap_lock for high-frequency madvise operations like
> MADV_DONTNEED and potentially MADV_FREE. For these two cases, it's highly
> unlikely that one process would be managing the memory of another. In v2,
> we're modifying common code, which is why we ended up here.
>
> We could consider doing:
>
> if (current->mm == mm)
> untagged_addr(start);
> else
> untagged_addr_remote(mm, start);

Ah, in other words, basically make it so that for now we don't use VMA
locking on remote processes, and then we can have two different paths
here for "local operation" and "MM-locked operation"? That's not very
pretty but from a correctness perspective I'm fine with that.