Re: [PATCH v2 4/5] mm: Refector release_pages()

From: Matthew Wilcox
Date: Fri Sep 01 2023 - 00:36:56 EST


On Thu, Aug 31, 2023 at 08:57:51PM +0100, Ryan Roberts wrote:
> On 30/08/2023 20:11, Matthew Wilcox wrote:
> > On Wed, Aug 30, 2023 at 10:50:10AM +0100, Ryan Roberts wrote:
> >> In preparation for implementing folios_put_refs() in the next patch,
> >> refactor release_pages() into a set of helper functions, which can be
> >> reused. The primary difference between release_pages() and
> >> folios_put_refs() is how they iterate over the set of folios. The
> >> per-folio actions are identical.
> >
> > As you noted, we have colliding patchsets. I'm not hugely happy with
> > how patch 4 turned out,
>
> Could you describe the issues as you see them? I'm keen not to repeat the same
> bad patterns in future.

I think you absolutely had the right idea. I'd've probably done the same
as you in the same circumstances as you. It's just that I'd done this
patch series getting rid of / simplifying a bunch of the code you were
modifying, and I thought it'd make your 4/5 irrelevant and 5/5 simpler.

> > void release_unref_folios(struct folio_batch *folios)
> > {
> > struct lruvec *lruvec = NULL;
> > unsigned long flags = 0;
> > int i;
> >
> > for (i = 0; i < folios->nr; i++) {
> > struct folio *folio = folios->folios[i];
> > free_swap_cache(folio);
>
> Agree this can't go here - would put it in pfn_range_put(). But would not want
> it in folios_put() as you suggeted in the other email - that would surely change
> the behaviour of folios_put()?

yes; perhaps there are times when we want to put a batch of folios and
_don't_ want to rip them out of the swapcache. I haven't thought about
this in much detail, and we're definitely venturing into areas of the
MM where I get myself in trouble (ie anonymous memory).

> > __page_cache_release(folio, &lruvec, &flags);
> > }
>
> I think you would need to add:
>
> if (lruvec)
> unlock_page_lruvec_irqrestore(lruvec, flags);

Indeed.

> > mem_cgroup_uncharge_folios(folios);
> > free_unref_folios(folios);
> > }
> >
> > then this becomes:
> >
> > void folios_put(struct folio_batch *folios)
> > {
> > int i, j;
> >
> > for (i = 0, j = 0; i < folios->nr; i++) {
> > struct folio *folio = folios->folios[i];
> >
> > if (is_huge_zero_page(&folio->page))
> > continue;
> > if (folio_is_zone_device(folio)) {
> > if (put_devmap_managed_page(&folio->page))
> > continue;
> > if (folio_put_testzero(folio))
> > free_zone_device_page(&folio->page);
> > continue;
> > }
> >
> > if (!folio_put_testzero(folio))
> > continue;
> > if (folio_test_hugetlb(folio)) {
> > free_huge_folio(folio);
> > continue;
> > }
> >
> > if (j != i)
> > folios->folios[j] = folio;
> > j++;
> > }
> >
> > folios->nr = j;
> > if (!j)
> > return;
> > release_unref_folios(folios);
> > }
> >
> > and pfn_range_put() also becomes shorter and loses all the lruvec work.
>
> something like this?
>
> void pfn_range_put(struct pfn_range *range, unsigned int nr)
> {
> struct folio_batch folios;
> unsigned int i;
>
> folio_batch_init(&folios);
> for (i = 0; i < nr; i++) {
> struct folio *folio = pfn_folio(range[i].start);
> unsigned int refs = range[i].end - range[i].start;
>
> free_swap_cache(folio); // <<<<< HERE? To be equivalent to
> // free_pages_and_swap_cache()
>
> if (is_huge_zero_page(&folio->page))
> continue;
>
> if (folio_is_zone_device(folio)) {
> if (put_devmap_managed_page_refs(&folio->page, refs))
> continue;
> if (folio_ref_sub_and_test(folio, refs))
> free_zone_device_page(&folio->page);
> continue;
> }
>
> if (!folio_ref_sub_and_test(folio, refs))
> continue;
>
> /* hugetlb has its own memcg */
> if (folio_test_hugetlb(folio)) {
> free_huge_folio(folio);
> continue;
> }
>
> if (folio_batch_add(&folios, folio) == 0)
> release_unref_folios(&folios);
> }
>
> if (folios.nr)
> release_unref_folios(&folios);
> }
>
> >
> > Thoughts?
>
> Looks like its getting there to me. Although the bulk of the logic inside the
> loop is still common between folios_put() and pfn_range_put(); perhaps we can
> have a common helper for that, which would just need to take (folio, refs)?
>
> So by adding free_swap_cache() above, we can call pfn_range_put() direct and can
> remove free_pages_and_swap_cache() entirely.

Yes. Maybe this works? I'd like it to!

> What's the best way to move forward here? Two options as I see it:
>
> - I drop patch 4 and create a version of pfn_range_put() that has the same
> semantic as above but essentially just copies the old release_pages() (similar
> to my v1). That keeps my series independent. Then you can replace with the new
> pfn_range_put() as part of your series.
>
> - We can just hook my patches up to the end of your series and do it all in one go.
>
> Opinion?

I'm reluctant to tell you "hey, delay until after this series of mine".
We don't have good data yet on whether my patch series helps or hurts.
Plus I'm about to take a few days off (I'll be back for next Wednesday's
meeting). I think your first option is better (and do feel free to use
a different name from pfn_range_put()) just to keep us from colliding
in ways that ultimately don't matter.