Re: [PATCH v4 3/4] mm: Support batched unmap for lazyfree large folios during reclamation

From: David Hildenbrand
Date: Wed Jun 25 2025 - 09:02:30 EST


On 25.06.25 14:58, Lance Yang wrote:


On 2025/6/25 20:09, David Hildenbrand wrote:
On 25.06.25 13:42, Barry Song wrote:
On Wed, Jun 25, 2025 at 11:27 PM David Hildenbrand <david@xxxxxxxxxx>
wrote:

On 25.06.25 13:15, Barry Song wrote:
On Wed, Jun 25, 2025 at 11:01 PM David Hildenbrand
<david@xxxxxxxxxx> wrote:

On 25.06.25 12:57, Barry Song wrote:

Note that I don't quite understand why we have to batch the
whole thing
or fallback to
individual pages. Why can't we perform other batches that span
only some
PTEs? What's special
about 1 PTE vs. 2 PTEs vs. all PTEs?

That's a good point about the "all-or-nothing" batching logic ;)

It seems the "all-or-nothing" approach is specific to the
lazyfree use
case, which needs to unmap the entire folio for reclamation. If
that's
not possible, it falls back to the single-page slow path.

Other cases advance the PTE themselves, while try_to_unmap_one()
relies
on page_vma_mapped_walk() to advance the PTE. Unless we want to
manually
modify pvmw.pte and pvmw.address outside of
page_vma_mapped_walk(), which
to me seems like a violation of layers. :-)

Please explain to me why the following is not clearer and better:

This part is much clearer, but that doesn’t necessarily improve the
overall
picture. The main challenge is how to exit the iteration of
while (page_vma_mapped_walk(&pvmw)).

Okay, I get what you mean now.


Right now, we have it laid out quite straightforwardly:
                  /* We have already batched the entire folio */
                  if (nr_pages > 1)
                          goto walk_done;


Given that the comment is completely confusing whens seeing the
check ... :)

/*
   * If we are sure that we batched the entire folio and cleared all
PTEs,
   * we can just optimize and stop right here.
   */
if (nr_pages == folio_nr_pages(folio))
         goto walk_done;

would make the comment match.

Yes, that clarifies it.



with any nr between 1 and folio_nr_pages(), we have to consider two
issues:
1. How to skip PTE checks inside page_vma_mapped_walk for entries that
were already handled in the previous batch;

They are cleared if we reach that point. So the pte_none() checks will
simply skip them?

2. How to break the iteration when this batch has arrived at the end.

page_vma_mapped_walk() should be doing that?

It seems you might have missed the part in my reply that says:
"Of course, we could avoid both, but that would mean performing
unnecessary
checks inside page_vma_mapped_walk()."
> > That’s true for both. But I’m wondering why we’re still doing the
check,
even when we’re fairly sure they’ve already been cleared or we’ve reached
the end :-)

:)


Somehow, I feel we could combine your cleanup code—which handles a batch
size of "nr" between 1 and nr_pages—with the
"if (nr_pages == folio_nr_pages(folio)) goto walk_done" check.

Yeah, that's what I was suggesting. It would have to be part of the
cleanup I think.

I'm still wondering if there is a case where

if (nr_pages == folio_nr_pages(folio))
    goto walk_done;

would be wrong when dealing with small folios.

We can make the check more explicit to avoid any future trouble ;)

if (nr_pages > 1 && nr_pages == folio_nr_pages(folio))
goto walk_done;

It should be safe for small folios.

I mean, we are walking a single VMA, and a small folio should never be mapped multiple times into the same VMA. (ignoring KSM, but KSM is handled differently either way)

So we should be good, just wanted to raise it.

--
Cheers,

David / dhildenb