Re: [PATCH] mm: Stop kswapd early when nothing's waiting for it to free pages

From: Shakeel Butt
Date: Wed Feb 26 2020 - 12:01:12 EST


On Wed, Feb 26, 2020 at 1:08 AM Michal Hocko <mhocko@xxxxxxxxxx> wrote:
>
> On Tue 25-02-20 14:30:03, Shakeel Butt wrote:
> > On Tue, Feb 25, 2020 at 1:10 AM Michal Hocko <mhocko@xxxxxxxxxx> wrote:
> > >
> > [snip]
> > >
> > > The proper fix should, however, check the amount of reclaimable pages
> > > and back off if they cannot meet the target IMO. We cannot rely on the
> > > general reclaimability here because that could really be thrashing.
> > >
> >
> > "check the amount of reclaimable pages" vs "cannot rely on the general
> > reclaimability"? Can you clarify?
>
> kswapd targets the high watermark and if your reclaimable memory (aka
> zone_reclaimable_pages) is lower than the high wmark then it cannot
> simply satisfy that target, right? Keeping reclaim in that situations
> seems counter productive to me because you keep evicting pages that
> might be reused without any feedback mechanism on the actual usage.
> Please see my other reply.
>

I understand and agree with the argument that if reclaimable pages are
less than high wmark then no need to reclaim. Regarding not depending
on general reclaimability, I thought you meant that even if
reclaimable pages are over high wmark, we might not want to continue
the reclaim to not cause thrashing. Is that right?

> > BTW we are seeing a similar situation in our production environment.
> > We have swappiness=0, no swap from kswapd (because we don't swapout on
> > pressure, only on cold age) and too few file pages, the kswapd goes
> > crazy on shrink_slab and spends 100% cpu on it.
>
> I am not sure this is the same problem. It seems that the slab shrinkers
> are not really a bottle neck here. I would recommend you to identify
> which shrinkers are eating the cpu time in your case.
>

The perf profile shows that the kswapd is spending almost all its time
in list_lru_count_one and memcg tree traversal. So, it's not just one
shrinker.

Yes, it's not exactly the same problem but I would say it is similar.
For Sultan's issue, even if there are many reclaimable pages, we don't
want to thrash. In this issue, thrashing is not happening but kswapd
is going nuts on slab shrinkers.

Shakeel