Re: [PATCH] mm: swap: check for xa_zero_entry() on vma in swapoff path

From: David Hildenbrand
Date: Mon Aug 11 2025 - 12:09:59 EST

Next message: Alexander Sverdlin: "Re: [PATCH 111/114] clk: divider: remove round_rate() in favor of determine_rate()"
Previous message: Liam R. Howlett: "Re: [PATCH] mm: swap: check for xa_zero_entry() on vma in swapoff path"
In reply to: Liam R. Howlett: "Re: [PATCH] mm: swap: check for xa_zero_entry() on vma in swapoff path"
Next in thread: Liam R. Howlett: "Re: [PATCH] mm: swap: check for xa_zero_entry() on vma in swapoff path"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

I think it may actually be difficult to do on some level or there was some
reason we couldn't, but I may be mistaken.

Thanks for the information!

Down the rabbit hole we go..

The cloning of the tree happens by copying the tree in DFS and
replacing the old nodes with new nodes. The tree leaves end up being
copied, which contains all the vmas (unless DONT_COPY is set, so
basically always all of them..). When the tree is copied, we have a
duplicate of the tree with pointers to all the vmas in the old process.

The way the tree fails is that we've been unable to finish cloning it,
usually for out of memory reasons. So, this means we have a tree with
new and exciting vmas that have never been used and old but still active
vmas in oldmm.

The failure point is then marked with an XA_ZERO_ENTRY, which will
succeed in storing as it's a direct replacement in the tree so no
allocations necessary. Thus this is safe even in -ENOMEM scenarios.

Clearing out the stale data means we may actually need to allocate to
remove vmas from the new tree, because we use allocated memory in the
maple tree - we'll need to rebalance, new parents, etc, etc.

So, to remove the stale data - we may actually have to allocate memory.

Ah, that sucks. And I assume we cannot preallocate early, because we might actually require multiple alloc+free cycles.

But we're most likely out of memory.. and we don't want to get the
shrinker involved in a broken task teardown, especially since it has
already been run and failed to help..

We could replace all the old vmas with XA_ZERO_ENTRY, but that doesn't
really fix this issue either.

No.

I could make a function that frees all new vmas and destroys the tree
specifically for this failure state?

I think the problem is that some page tables were already copied, so we would have to zap them as well.

Maybe just factoring stuff from the exit_mmap() function could be one way to do it.

I'm almost certain this will lead to another whack-a-mole situation, but
those _should_ already be checked or written to work when there are zero
vmas in an mm (going by experience of what the scheduler does with an
empty tree). Syzbot finds these scenarios sometimes via signals or
other corner cases that can happen..

Yes, I rememebr fixing one instance of "0 VMAs". Most code paths just don't apply if a process was never actually ran I guess.

Then again, I also thought the unstable mm should be checked where
necessary to avoid assumptions on the mm state..?

This is funny because we already have a (probably) benign race with oom
here. This code may already visit the mm after __oom_reap_task_mm() and
the mm disappearing, but since the anon vmas should be removed,
unuse_mm() will skip them.

Although, I'm not sure what happens when
mmu_notifier_invalidate_range_start_nonblock() fails AND unuse_mm() is
called on the mm after. Maybe checking the unstable mm is necessary
here anyways?

Can we have MMU notifiers active while the process never even ran and we are only halfway through duplicating VMAs?

--
Cheers,

David / dhildenb

Next message: Alexander Sverdlin: "Re: [PATCH 111/114] clk: divider: remove round_rate() in favor of determine_rate()"
Previous message: Liam R. Howlett: "Re: [PATCH] mm: swap: check for xa_zero_entry() on vma in swapoff path"
In reply to: Liam R. Howlett: "Re: [PATCH] mm: swap: check for xa_zero_entry() on vma in swapoff path"
Next in thread: Liam R. Howlett: "Re: [PATCH] mm: swap: check for xa_zero_entry() on vma in swapoff path"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]