Re: [PATCH v3 6/6] mm: swap: entirely map large folios found in swapcache

From: Barry Song
Date: Tue May 07 2024 - 06:50:08 EST


On Tue, May 7, 2024 at 6:39 PM David Hildenbrand <david@xxxxxxxxxx> wrote:
>
> On 07.05.24 11:24, Barry Song wrote:
> > On Tue, May 7, 2024 at 8:59 PM David Hildenbrand <david@xxxxxxxxxx> wrote:
> >>
> >>>> Let's assume a single subpage of a large folio is no longer mapped.
> >>>> Then, we'd have:
> >>>>
> >>>> nr_pages == folio_nr_pages(folio) - 1.
> >>>>
> >>>> You could simply map+reuse most of the folio without COWing.
> >>>
> >>> yes. This is good but the pte which is no longer mapped could be
> >>> anyone within the nr_pages PTEs. so it could be quite tricky for
> >>> set_ptes.
> >>
> >> The swap batching logic should take care of that, otherwise it would be
> >> buggy.
> >
> > When you mention "it would be buggy," are you also referring to the current
> > fallback approach? or only refer to the future patch which might be able
> > to map/reuse "nr_pages - 1" pages?
>
> swap_pte_batch() should not skip any holes. So consequently, set_ptes()
> should do the right thing. (regarding your comment "could be quite ricky
> for set_ptes")
>
> So I think that should be working as expected.

maybe not. take a look at my current code, I am goto check_folio with
nr_pages = 1
if swap_pte_batch(folio_ptep, nr, folio_pte) != folio_nr_pages(folio).

+ nr_pages = 1;
+ ...
+ if (folio_test_large(folio) && folio_test_swapcache(folio)) {
+ int nr = folio_nr_pages(folio);
+ ...
+ if (!pte_same(folio_pte,
pte_move_swp_offset(vmf->orig_pte, -idx)) ||
+ swap_pte_batch(folio_ptep, nr, folio_pte) != nr)
+ goto check_folio; /* read here, i am falling
back nr_pages = 1 */
+
+
+ ...
+ nr_pages = nr;

The fallback(=1) works. but it seems you are proposing set nr_pages =
swap_pte_batch(folio_ptep, nr, folio_pte)
if (swap_pte_batch(folio_ptep, nr, folio_pte) > 1 &&
swap_pte_batch(folio_ptep, nr, folio_pte) <
nr_pages) ?

>
> >
> > The current patch falls back to setting nr_pages = 1 without mapping or
> > reusing nr_pages - 1. I feel your concern doesn't refer to this fallback?
> >
> >>
> >>>
> >>>>
> >>>> Once we support COW reuse of PTE-mapped THP we'd do the same. Here, it's
> >>>> just easy to detect that the folio is exclusive (folio_ref_count(folio)
> >>>> == 1 before mapping anything).
> >>>>
> >>>> If you really want to mimic what do_wp_page() currently does, you should
> >>>> have:
> >>>>
> >>>> exclusive || (folio_ref_count(folio) == 1 && !folio_test_large(folio))
> >>>
> >>> I actually dislike the part that do_wp_page() handles the reuse of a large
> >>> folio which is entirely mapped. For example, A forks B, B exit, we write
> >>> A's large folio, we get nr_pages CoW of small folios. Ideally, we can
> >>> reuse the whole folios for writing.
> >>
> >> Yes, see the link I shared to what I am working on. There isn't really a
> >> question if what we do right now needs to be improved and all these
> >> scenarios are pretty obvious clear.
> >
> > Great! I plan to dedicate more time to reviewing your work.
>
> Nice! And there will be a lot of follow-up optimization work I won't
> tackle immediately regarding COW (COW-reuse around, maybe sometimes we
> want to COW bigger chunks).
>
> I still have making PageAnonExclusive a per-folio flag on my TODO list,
> that will help the COW-reuse around case a lot.
>
> >
> >>
> >>>
> >>>>
> >>>> Personally, I think we should keep it simple here and use:
> >>>>
> >>>> exclusive || folio_ref_count(folio) == 1
> >>>
> >>> I feel this is still better than
> >>> exclusive || (folio_ref_count(folio) == 1 && !folio_test_large(folio))
> >>> as we reuse the whole large folio. the do_wp_page() behaviour
> >>> doesn't have this.
> >>
> >> Yes, but there is the comment "Same logic as in do_wp_page();". We
> >> already ran into issues having different COW reuse logic all over the
> >> place. For this case here, I don't care if we leave it as
> >>
> >> "exclusive || folio_ref_count(folio) == 1"
> >
> > I'm perfectly fine with using the code for this patchset and maybe
> > looking for other
> > opportunities for potential optimization as an incremental patchset,
> > for example,
> > reusing the remaining PTEs as suggested by you - "simply map+reuse most of
> > the folio without COWing."
> >
> >>
> >> But let's not try inventing new stuff here.
> >
> > It seems you ignored and snipped my "16 + 14" pages and "15" pages
> > example though. but once we support "simply map+reuse most of the
> > folio without COWing", the "16+14" problem can be resolved, instead,
> > we consume 16 pages.
>
>
> Oh, sorry for skipping that, for me it was rather clear: the partially
> mapped folios will be on the deferred split list and the excess memory
> can (and will be) reclaimed when there is need. So this temporary memory
> consumption is usually not a problem in practice. But yes, something to
> optimize (just like COW reuse in general).
>
> --
> Cheers,
>
> David / dhildenb
>