Re: [PATCH RESEND] mm: don't raise MEMCG_OOM event due to failed high-order allocation

From: Michal Hocko
Date: Wed Sep 26 2018 - 04:25:04 EST


On Wed 26-09-18 09:13:43, Roman Gushchin wrote:
> On Tue, Sep 25, 2018 at 08:58:45PM +0200, Michal Hocko wrote:
> > On Mon 17-09-18 23:10:59, Roman Gushchin wrote:
> > > The memcg OOM killer is never invoked due to a failed high-order
> > > allocation, however the MEMCG_OOM event can be raised.
> > >
> > > As shown below, it can happen under conditions, which are very
> > > far from a real OOM: e.g. there is plenty of clean pagecache
> > > and low memory pressure.
> > >
> > > There is no sense in raising an OOM event in such a case,
> > > as it might confuse a user and lead to wrong and excessive actions.
> > >
> > > Let's look at the charging path in try_caharge(). If the memory usage
> > > is about memory.max, which is absolutely natural for most memory cgroups,
> > > we try to reclaim some pages. Even if we were able to reclaim
> > > enough memory for the allocation, the following check can fail due to
> > > a race with another concurrent allocation:
> > >
> > > if (mem_cgroup_margin(mem_over_limit) >= nr_pages)
> > > goto retry;
> > >
> > > For regular pages the following condition will save us from triggering
> > > the OOM:
> > >
> > > if (nr_reclaimed && nr_pages <= (1 << PAGE_ALLOC_COSTLY_ORDER))
> > > goto retry;
> > >
> > > But for high-order allocation this condition will intentionally fail.
> > > The reason behind is that we'll likely fall to regular pages anyway,
> > > so it's ok and even preferred to return ENOMEM.
> > >
> > > In this case the idea of raising MEMCG_OOM looks dubious.
> >
> > I would really appreciate an example of application that would get
> > confused by consuming this event and an explanation why. I do agree that
> > the event itself is kinda weird because it doesn't give you any context
> > for what kind of requests the memcg is OOM. Costly orders are a little
> > different story than others and users shouldn't care about this because
> > this is a mere implementation detail.
>
> Our container management system (called Tupperware) used the OOM event
> as a signal that a workload might be affected by the OOM killer, so
> it restarted the corresponding container.
>
> I started looking at this problem, when I was reported, that it sometimes
> happens when there is a plenty of inactive page cache, and also there were
> no signs that the OOM killer has been invoking at all.
> The proposed patch resolves this problem.

Thanks! This is exactly the kind of information that should be in the
changelog. With the changelog updated and an explicit note in the
documentation that the event is triggered only when the memcg is _going_
to consider the oom killer as the only option you can add

Acked-by: Michal Hocko <mhocko@xxxxxxxx>

> > In other words, do we have any users to actually care about this half
> > baked event at all? Shouldn't we simply stop emiting it (or make it an
> > alias of OOM_KILL) rather than making it slightly better but yet kinda
> > incomplete?
>
> The only problem with OOM_KILL I see is that OOM_KILL might not be raised
> at all, if the OOM killer is not able to find an appropriate victim.
> For instance, if all tasks are oom protected (oom_score_adj set to -1000).

This is a very good point.
--
Michal Hocko
SUSE Labs