Re: [PATCH] mm: swap: check for xa_zero_entry() on vma in swapoff path

From: David Hildenbrand
Date: Mon Aug 11 2025 - 09:04:00 EST


On 11.08.25 14:14, Lorenzo Stoakes wrote:
On Mon, Aug 11, 2025 at 03:13:14PM +0530, Charan Teja Kalla wrote:
Thanks David, for the reply!!
On 8/8/2025 5:34 PM, David Hildenbrand wrote:
        if (mpnt) {
            mas_set_range(&vmi.mas, mpnt->vm_start, mpnt->vm_end - 1);
            mas_store(&vmi.mas, XA_ZERO_ENTRY);
            /* Avoid OOM iterating a broken tree */
            set_bit(MMF_OOM_SKIP, &mm->flags);
        }
        /*
         * The mm_struct is going to exit, but the locks will be dropped
         * first.  Set the mm_struct as unstable is advisable as it is
         * not fully initialised.
         */
        set_bit(MMF_UNSTABLE, &mm->flags);
    }

Shouldn't we just remove anything from the tree here that was not copied
immediately?

Another fix would be to just check MMF_UNSTABLE in unuse_mm(). But
having these MMF_UNSTABLE checks all over the place feels a bit like
whack-a-mole.

Seems MMF_UNSTABLE is the expectation per the commit,
64c37e134b12("kernel: be more careful about dup_mmap() failures and
uprobe registering"). Excerpt(s) from the commit message:

This really is whack-a-mole yeah.


This patch sets the MMF_OOM_SKIP to avoid the iteration of the vmas on
the oom side (even though this is extremely unlikely to be selected as
an oom victim in the race window), and __sets MMF_UNSTABLE to avoid
other potential users from using a partially initialised mm_struct.


But... maybe this is better for the _hotfix_ version as a nicer way of
doing this.

I would prefer using MMF_UNSTABLE as a hotfix.


When registering vmas for uprobe, skip the vmas in an mm that is marked
unstable. Modifying a vma in an unstable mm may cause issues if the mm
isn't fully initialised.__

Is there anything preventing us from just leaving a proper tree that
reflects reality in place before we drop the write lock?

When you mean proper tree, is this about the your previous question? --
Shouldn't we just remove anything from the tree here that was not copied
immediately?

Commit d24062914837 (" fork: use __mt_dup() to duplicate maple tree in
dup_mmap()") did this for efficiency, so it'd be a regression to do this.

We're talking about the case where fork *fails*. That cannot possibly be relevant for performance, can it? :)

--
Cheers,

David / dhildenb