Re: [RFC PATCH] mm, oom: move GFP_NOFS check to out_of_memory

From: Tetsuo Handa
Date: Tue Mar 29 2016 - 11:29:38 EST


Michal Hocko wrote:
> On Tue 29-03-16 22:45:40, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > > From: Michal Hocko <mhocko@xxxxxxxx>
> > >
> > > __alloc_pages_may_oom is the central place to decide when the
> > > out_of_memory should be invoked. This is a good approach for most checks
> > > there because they are page allocator specific and the allocation fails
> > > right after.
> > >
> > > The notable exception is GFP_NOFS context which is faking
> > > did_some_progress and keep the page allocator looping even though there
> > > couldn't have been any progress from the OOM killer. This patch doesn't
> > > change this behavior because we are not ready to allow those allocation
> > > requests to fail yet. Instead __GFP_FS check is moved down to
> > > out_of_memory and prevent from OOM victim selection there. There are
> > > two reasons for that
> > > - OOM notifiers might release some memory even from this context
> > > as none of the registered notifier seems to be FS related
> > > - this might help a dying thread to get an access to memory
> > > reserves and move on which will make the behavior more
> > > consistent with the case when the task gets killed from a
> > > different context.
> >
> > Allowing !__GFP_FS allocations to get TIF_MEMDIE by calling the shortcuts in
> > out_of_memory() would be fine. But I don't like the direction you want to go.
> >
> > I don't like failing !__GFP_FS allocations without selecting OOM victim
> > ( http://lkml.kernel.org/r/201603252054.ADH30264.OJQFFLMOHFSOVt@xxxxxxxxxxxxxxxxxxx ).
>
> I didn't get to read and digest that email yet but from a quick glance
> it doesn't seem to be directly related to this patch. Even if we decide
> that __GFP_FS vs. OOM killer logic is flawed for some reason then would
> build on top as granting the access to memory reserves is not against
> it.
>

I think that removing these shortcuts is better.

> > Also, I suggested removing all shortcuts by setting TIF_MEMDIE from oom_kill_process()
> > ( http://lkml.kernel.org/r/1458529634-5951-1-git-send-email-penguin-kernel@xxxxxxxxxxxxxxxxxxx ).
>
> I personally do not like this much. I believe we have already tried to
> explain why we have (some of) those shortcuts. They might be too
> optimistic and there is a room for improvements for sure but I am not
> convinced we can get rid of them that easily.

These shortcuts are too optimistic. They assume that the target thread can call
exit_oom_victim() but the reality is that the target task can get stuck at
down_read(&mm->mmap_sem) in exit_mm(). If SIGKILL were sent to all thread
groups sharing that mm, the possibility of the target thread getting stuck at
down_read(&mm->mmap_sem) in exit_mm() is significantly reduced.

http://lkml.kernel.org/r/20160329141442.GD4466@xxxxxxxxxxxxxx tried to let
the OOM reaper to call exit_oom_victim() on behalf of the target thread
by waking up the OOM reaper. But the OOM reaper won't call exit_oom_victim()
because the OOM reaper will fail to reap memory because some thread sharing
that mm and holding mm->mmap_sem for write will not receive SIGKILL if we use
these shortcuts. As far as I know, all existing explanations for why we have
these shortcuts are ignoring the possibility of such some thread.