Re: [PATCH v3 6/6] mm: swap: entirely map large folios found in swapcache

From: Barry Song
Date: Mon May 06 2024 - 08:27:25 EST


On Tue, May 7, 2024 at 12:05 AM David Hildenbrand <david@xxxxxxxxxx> wrote:
>
> On 03.05.24 02:50, Barry Song wrote:
> > From: Chuanhua Han <hanchuanhua@xxxxxxxx>
> >
> > When a large folio is found in the swapcache, the current implementation
> > requires calling do_swap_page() nr_pages times, resulting in nr_pages
> > page faults. This patch opts to map the entire large folio at once to
> > minimize page faults. Additionally, redundant checks and early exits
> > for ARM64 MTE restoring are removed.
> >
> > Signed-off-by: Chuanhua Han <hanchuanhua@xxxxxxxx>
> > Co-developed-by: Barry Song <v-songbaohua@xxxxxxxx>
> > Signed-off-by: Barry Song <v-songbaohua@xxxxxxxx>
> > ---
> > mm/memory.c | 60 ++++++++++++++++++++++++++++++++++++++++++-----------
> > 1 file changed, 48 insertions(+), 12 deletions(-)
> >
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 22e7c33cc747..940fdbe69fa1 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -3968,6 +3968,10 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> > pte_t pte;
> > vm_fault_t ret = 0;
> > void *shadow = NULL;
> > + int nr_pages = 1;
> > + unsigned long page_idx = 0;
> > + unsigned long address = vmf->address;
> > + pte_t *ptep;
> >
> > if (!pte_unmap_same(vmf))
> > goto out;
> > @@ -4166,6 +4170,36 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> > goto out_nomap;
> > }
> >
> > + ptep = vmf->pte;
> > + if (folio_test_large(folio) && folio_test_swapcache(folio)) {
> > + int nr = folio_nr_pages(folio);
> > + unsigned long idx = folio_page_idx(folio, page);
> > + unsigned long folio_start = vmf->address - idx * PAGE_SIZE;
> > + unsigned long folio_end = folio_start + nr * PAGE_SIZE;
> > + pte_t *folio_ptep;
> > + pte_t folio_pte;
> > +
> > + if (unlikely(folio_start < max(vmf->address & PMD_MASK, vma->vm_start)))
> > + goto check_folio;
> > + if (unlikely(folio_end > pmd_addr_end(vmf->address, vma->vm_end)))
> > + goto check_folio;
> > +
> > + folio_ptep = vmf->pte - idx;
> > + folio_pte = ptep_get(folio_ptep);
> > + if (!pte_same(folio_pte, pte_move_swp_offset(vmf->orig_pte, -idx)) ||
> > + swap_pte_batch(folio_ptep, nr, folio_pte) != nr)
> > + goto check_folio;
> > +
> > + page_idx = idx;
> > + address = folio_start;
> > + ptep = folio_ptep;
> > + nr_pages = nr;
> > + entry = folio->swap;
> > + page = &folio->page;
> > + }
> > +
> > +check_folio:
> > +
> > /*
> > * PG_anon_exclusive reuses PG_mappedtodisk for anon pages. A swap pte
> > * must never point at an anonymous page in the swapcache that is
> > @@ -4225,12 +4259,13 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> > * We're already holding a reference on the page but haven't mapped it
> > * yet.
> > */
> > - swap_free_nr(entry, 1);
> > + swap_free_nr(entry, nr_pages);
> > if (should_try_to_free_swap(folio, vma, vmf->flags))
> > folio_free_swap(folio);
> >
> > - inc_mm_counter(vma->vm_mm, MM_ANONPAGES);
> > - dec_mm_counter(vma->vm_mm, MM_SWAPENTS);
> > + folio_ref_add(folio, nr_pages - 1);
> > + add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages);
> > + add_mm_counter(vma->vm_mm, MM_SWAPENTS, -nr_pages);
> > pte = mk_pte(page, vma->vm_page_prot);
> >
> > /*
> > @@ -4240,34 +4275,35 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> > * exclusivity.
> > */
> > if (!folio_test_ksm(folio) &&
> > - (exclusive || folio_ref_count(folio) == 1)) {
> > + (exclusive || (folio_ref_count(folio) == nr_pages &&
> > + folio_nr_pages(folio) == nr_pages))) {
> > if (vmf->flags & FAULT_FLAG_WRITE) {
> > pte = maybe_mkwrite(pte_mkdirty(pte), vma);
> > vmf->flags &= ~FAULT_FLAG_WRITE;
>
> I fail to convince myself that this change is correct, and if it is
> correct, it's confusing (I think there is a dependency on
> folio_free_swap() having been called and succeeding, such that we don't
> have a folio that is in the swapcache at this point).
>
> Why can't we move the folio_ref_add() after this check and just leave
> the check as it is?
>
> "folio_ref_count(folio) == 1" is as clear as it gets: we hold the single
> reference, so we can do with this thing whatever we want: it's certainly
> exclusive. No swapcache, no other people mapping it.

Right.
I believe the code works correctly but is a bit confusing. as you said,
we might move folio_ref_add() behind folio_ref_count(folio) == 1.

>
>
> --
> Cheers,
>
> David / dhildenb
>

Thanks
Barry