Re: OOM detection regressions since 4.7

From: Michal Hocko
Date: Mon Aug 22 2016 - 09:42:46 EST


On Mon 22-08-16 09:31:14, Greg KH wrote:
> On Mon, Aug 22, 2016 at 12:54:41PM +0200, Michal Hocko wrote:
> > On Mon 22-08-16 06:05:28, Greg KH wrote:
> > > On Mon, Aug 22, 2016 at 11:37:07AM +0200, Michal Hocko wrote:
> > [...]
> > > > > From 899b738538de41295839dca2090a774bdd17acd2 Mon Sep 17 00:00:00 2001
> > > > > From: Michal Hocko <mhocko@xxxxxxxx>
> > > > > Date: Mon, 22 Aug 2016 10:52:06 +0200
> > > > > Subject: [PATCH] mm, oom: prevent pre-mature OOM killer invocation for high
> > > > > order request
> > > > >
> > > > > There have been several reports about pre-mature OOM killer invocation
> > > > > in 4.7 kernel when order-2 allocation request (for the kernel stack)
> > > > > invoked OOM killer even during basic workloads (light IO or even kernel
> > > > > compile on some filesystems). In all reported cases the memory is
> > > > > fragmented and there are no order-2+ pages available. There is usually
> > > > > a large amount of slab memory (usually dentries/inodes) and further
> > > > > debugging has shown that there are way too many unmovable blocks which
> > > > > are skipped during the compaction. Multiple reporters have confirmed that
> > > > > the current linux-next which includes [1] and [2] helped and OOMs are
> > > > > not reproducible anymore. A simpler fix for the stable is to simply
> > > > > ignore the compaction feedback and retry as long as there is a reclaim
> > > > > progress for high order requests which we used to do before. We already
> > > > > do that for CONFING_COMPACTION=n so let's reuse the same code when
> > > > > compaction is enabled as well.
> > > > >
> > > > > [1] http://lkml.kernel.org/r/20160810091226.6709-1-vbabka@xxxxxxx
> > > > > [2] http://lkml.kernel.org/r/f7a9ea9d-bb88-bfd6-e340-3a933559305a@xxxxxxx
> > > > >
> > > > > Fixes: 0a0337e0d1d1 ("mm, oom: rework oom detection")
> > > > > Signed-off-by: Michal Hocko <mhocko@xxxxxxxx>
> > > > > ---
> > > > > mm/page_alloc.c | 50 ++------------------------------------------------
> > > > > 1 file changed, 2 insertions(+), 48 deletions(-)
> > >
> > > So, if this goes into Linus's tree, can you let stable@xxxxxxxxxxxxxxx
> > > know about it so we can add it to the 4.7-stable tree? Otherwise
> > > there's not much I can do here now, right?
> >
> > My plan would be actually to not push this to Linus because we have a
> > proper fix for Linus tree. It is just that the fix is quite large and I
> > felt like the stable should get the most simple fix possible, which is
> > this partial revert. So, what I am trying to tell is to push a non-linus
> > patch to stable as it is simpler.
>
> I _REALLY_ hate taking any patches that are not in Linus's tree as 90%
> of the time (well, almost always), it ends up being wrong and hurting us
> in the end.

I do not like it either but if there is a simple and straightforward
workaround for stable while the upstream can go with the _proper_ fix
from the longer POV then I think this is perfectly justified. Stable
should be always about the simplest fix for the problem IMHO.

Of course, if Linus/Andrew doesn't like to take those compaction
improvements this late then I will ask to merge the partial revert to
Linus tree as well and then there is not much to discuss.

> What exactly are the commits that are in Linus's tree that resolve this
> issue?

The initial email in this thread has pointed to those patches. Please
note that some of its dependeces (mostly code cleanups) are already
merged and that backporting without them would make the backport harder
and more risky.
--
Michal Hocko
SUSE Labs