Re: [PATCH] mm: swap: check for xa_zero_entry() on vma in swapoff path
From: David Hildenbrand
Date: Mon Aug 11 2025 - 09:04:00 EST
On 11.08.25 14:14, Lorenzo Stoakes wrote:
On Mon, Aug 11, 2025 at 03:13:14PM +0530, Charan Teja Kalla wrote:
Thanks David, for the reply!!
On 8/8/2025 5:34 PM, David Hildenbrand wrote:
if (mpnt) {
mas_set_range(&vmi.mas, mpnt->vm_start, mpnt->vm_end - 1);
mas_store(&vmi.mas, XA_ZERO_ENTRY);
/* Avoid OOM iterating a broken tree */
set_bit(MMF_OOM_SKIP, &mm->flags);
}
/*
* The mm_struct is going to exit, but the locks will be dropped
* first. Set the mm_struct as unstable is advisable as it is
* not fully initialised.
*/
set_bit(MMF_UNSTABLE, &mm->flags);
}
Shouldn't we just remove anything from the tree here that was not copied
immediately?
Another fix would be to just check MMF_UNSTABLE in unuse_mm(). But
having these MMF_UNSTABLE checks all over the place feels a bit like
whack-a-mole.
Seems MMF_UNSTABLE is the expectation per the commit,
64c37e134b12("kernel: be more careful about dup_mmap() failures and
uprobe registering"). Excerpt(s) from the commit message:
This really is whack-a-mole yeah.
This patch sets the MMF_OOM_SKIP to avoid the iteration of the vmas on
the oom side (even though this is extremely unlikely to be selected as
an oom victim in the race window), and __sets MMF_UNSTABLE to avoid
other potential users from using a partially initialised mm_struct.
But... maybe this is better for the _hotfix_ version as a nicer way of
doing this.
I would prefer using MMF_UNSTABLE as a hotfix.
When registering vmas for uprobe, skip the vmas in an mm that is marked
unstable. Modifying a vma in an unstable mm may cause issues if the mm
isn't fully initialised.__
Is there anything preventing us from just leaving a proper tree that
reflects reality in place before we drop the write lock?
When you mean proper tree, is this about the your previous question? --
Shouldn't we just remove anything from the tree here that was not copied
immediately?
Commit d24062914837 (" fork: use __mt_dup() to duplicate maple tree in
dup_mmap()") did this for efficiency, so it'd be a regression to do this.
We're talking about the case where fork *fails*. That cannot possibly be
relevant for performance, can it? :)
--
Cheers,
David / dhildenb