Re: [patch] mm, oom: prevent additional oom kills before memory is freed

From: David Rientjes
Date: Thu Jun 15 2017 - 17:37:52 EST


On Thu, 15 Jun 2017, Tetsuo Handa wrote:

> David is trying to avoid setting MMF_OOM_SKIP when the OOM reaper found that
> mm->users == 0.

Yes, because MMF_OOM_SKIP enables the oom killer to select another process
to kill and will do so without the original victim's mm being able to
undergo exit_mmap(). So now we kill two or more processes when one would
have sufficied; I have seen up to four processes killed unnecessarily
without this patch.

> But we must not wait forever because __mmput() might fail to
> release some memory immediately. If __mmput() did not release some memory within
> schedule_timeout_idle(HZ/10) * MAX_OOM_REAP_RETRIES sleep, let the OOM killer
> invoke again. So, this is the case we want to address here, isn't it?
>

It is obviously a function of the number of threads that share the mm with
the oom victim to determine how long would be a sensible amount of time to
wait for __mmput() to even get a chance to be called, along with
potentially allowing a non-zero number of those threads to allocate from
memory reserves to allow them to eventually drop mm->mmap_sem to make
forward progress.

I have not witnessed any thread stalling in __mmput() that prevents the
mm's memory to be freed. I have witnessed several processes oom killed
unnecessarily for a single oom condition where before MMF_OOM_SKIP was
introduced, a single oom kill would have sufficed.