Re: [PATCH 1/2] mm: zswap: increase shrinking protection for zswap swapins only

From: Nhat Pham
Date: Wed Mar 20 2024 - 10:48:32 EST


On Wed, Mar 20, 2024 at 2:51 AM Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
>
> On Wed, Mar 20, 2024 at 02:08:22AM +0000, Yosry Ahmed wrote:
> > Currently, the number of protected zswap entries corresponding to an
> > lruvec are incremented every time we swapin a page.
>
> Correct. This is the primary signal that the shrinker is being too
> aggressive in moving entries to disk and should slow down...?

Yup. Currently, there are two scenarios in which we increase zswap
protection area:

1. zswap lru movement - for instance, if a page for some reasons is
rotated in the LRU. When this happens, we increment the protection
size, so that the page at the tail end of the protected area does not
lose its protection because of (potentially) spurious LRU churnings.
2. swapin - when this happens, it is a signal that the shrinker is
being too... enthusiastic in its reclaiming action. We should be more
conservative, hence the increase in protection.

I think there's some confusion around this, because we use the
same-ish mechanism for two different events. Maybe I should have put
yet another fat comment at the callsites of zswap_folio_swapin() too
:)

>
> > This happens regardless of whether or not the page originated in
> > zswap. Hence, swapins from disk will lead to increasing protection
> > on potentially stale zswap entries. Furthermore, the increased

Hmmm my original thinking was that, had we protected the swapped-in
page back when it was still in zswap, we would have prevented this IO.
And since the pages in zswap are (at least conceptually) "warmer" than
the swapped in page, it is appropriate to increase the zswap
protection.

In fact, I was toying with the idea to max out the protection on
swap-in to temporarily cease all zswap shrinking, but that is perhaps
overkill :)

> > shrinking protection can lead to more pages skipping zswap and going

I think this can only happen when the protection is so strong that the
zswap pool is full, right?

In practice I have never seen this happen though. Did you observe it?
We have a fairly aggressive lru-size-based protection decaying as
well, so that should not happen...

Also technically the protection only applies to the dynamic shrinker -
the capacity-driven shrinking is not affected (although it's not very
effective - perhaps we can rework this?)

> > to disk, eventually leading to even more swapins from disk and
> > starting a vicious circle.
>
> How does shrinker protection affect zswap stores?
>
> On the contrary, I would expect this patch to create a runaway
> shrinker. The more aggressively it moves entries out to disk, the
> lower the rate of zswap loads, the more aggressively it moves more
> entries out to disk.
>
> > Instead, only increase the protection when pages are loaded from zswap.
> > This also has a nice side effect of removing zswap_folio_swapin() and
> > replacing it with a static helper that is only called from zswap_load().
> >
> > No problems were observed in practice, this was found through code
> > inspection.
>
> This is missing test results :)

Yeah it'd be nice to see benchmark numbers :)

I mean all this protection is heuristics anyway, so maybe I'm just
being overly conservative. But only numbers can show this.