Re: [PATCH] mm: vmscan: fix not scanning anonymous pages when detecting file refaults

From: Johannes Weiner
Date: Fri Jun 28 2019 - 10:23:23 EST


Hi Minchan,

On Fri, Jun 28, 2019 at 03:51:38PM +0900, Minchan Kim wrote:
> On Thu, Jun 27, 2019 at 02:41:23PM -0400, Johannes Weiner wrote:
> > On Wed, Jun 19, 2019 at 04:08:35PM +0800, Kuo-Hsin Yang wrote:
> > > Fixes: 2a2e48854d70 ("mm: vmscan: fix IO/refault regression in cache workingset transition")
> > > Signed-off-by: Kuo-Hsin Yang <vovoy@xxxxxxxxxxxx>
> >
> > Acked-by: Johannes Weiner <hannes@xxxxxxxxxxx>
> >
> > Your change makes sense - we should indeed not force cache trimming
> > only while the page cache is experiencing refaults.
> >
> > I can't say I fully understand the changelog, though. The problem of
>
> I guess the point of the patch is "actual_reclaim" paramter made divergency
> to balance file vs. anon LRU in get_scan_count. Thus, it ends up scanning
> file LRU active/inactive list at file thrashing state.

Look at the patch again. The parameter was only added to retain
existing behavior. We *always* did file-only reclaim while thrashing -
all the way back to the two commits I mentioned below.

> So, Fixes: 2a2e48854d70 ("mm: vmscan: fix IO/refault regression in cache workingset transition")
> would make sense to me since it introduces the parameter.

What is the observable behavior problem that this patch introduced?

> > forcing cache trimming while there is enough page cache is older than
> > the commit you refer to. It could be argued that this commit is
> > incomplete - it could have added refault detection not just to
> > inactive:active file balancing, but also the file:anon balancing; but
> > it didn't *cause* this problem.
> >
> > Shouldn't this be
> >
> > Fixes: e9868505987a ("mm,vmscan: only evict file pages when we have plenty")
> > Fixes: 7c5bd705d8f9 ("mm: memcg: only evict file pages when we have plenty")
>
> That would affect, too but it would be trouble to have stable backport
> since we don't have refault machinery in there.

Hm? The problematic behavior is that we force-scan file while file is
thrashing. We can obviously only solve this in kernels that can
actually detect thrashing.