Re: [PATCH 0/3] Limit runaway reclaim due to watermark boosting

From: Andrew Morton
Date: Tue Feb 25 2020 - 21:51:33 EST


On Tue, 25 Feb 2020 14:15:31 +0000 Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote:

> Ivan Babrou reported the following

http://lkml.kernel.org/r/CABWYdi1eOUD1DHORJxTsWPMT3BcZhz++xP1pXhT=x4SgxtgQZA@xxxxxxxxxxxxxx
is helpful.

> Commit 1c30844d2dfe ("mm: reclaim small amounts of memory when
> an external fragmentation event occurs") introduced undesired
> effects in our environment.
>
> * NUMA with 2 x CPU
> * 128GB of RAM
> * THP disabled
> * Upgraded from 4.19 to 5.4
>
> Before we saw free memory hover at around 1.4GB with no
> spikes. After the upgrade we saw some machines decide that they
> need a lot more than that, with frequent spikes above 10GB,
> often only on a single numa node.
>
> There have been a few reports recently that might be watermark boost
> related. Unfortunately, finding someone that can reproduce the problem
> and test a patch has been problematic. This series intends to limit
> potential damage only.

It's problematic that we don't understand what's happening. And these
palliatives can only reduce our ability to do that.

Rik seems to have the means to reproduce this (or something similar)
and it seems Ivan can test patches three weeks hence. So how about a
debug patch which will help figure out what's going on in there?