Re: [patch] mm, oom: prevent additional oom kills before memory is freed

From: Michal Hocko
Date: Thu Jun 15 2017 - 08:03:44 EST


On Thu 15-06-17 20:32:39, Tetsuo Handa wrote:
> Michal Hocko wrote:
[...]
> > An alternative would be to allow reaping and exit_mmap race. The unmap
> > part should just work I guess. We just have to be careful to not race
> > with free_pgtables and that shouldn't be too hard to implement (e.g.
> > (ab)use mmap_sem for write there). I haven't thought that through
> > completely though so I might miss something of course.
>
> I think below one is simpler.
[...]
> @@ -556,25 +553,21 @@ static void oom_reap_task(struct task_struct *tsk)
> struct mm_struct *mm = tsk->signal->oom_mm;
>
> /* Retry the down_read_trylock(mmap_sem) a few times */
> - while (attempts++ < MAX_OOM_REAP_RETRIES && !__oom_reap_task_mm(tsk, mm))
> + while (__oom_reap_task_mm(tsk, mm), !test_bit(MMF_OOM_SKIP, &mm->flags)
> + && attempts++ < MAX_OOM_REAP_RETRIES)
> schedule_timeout_idle(HZ/10);
>
> - if (attempts <= MAX_OOM_REAP_RETRIES)
> - goto done;
> -
> -
> - pr_info("oom_reaper: unable to reap pid:%d (%s)\n",
> - task_pid_nr(tsk), tsk->comm);
> - debug_show_all_locks();
> -
> -done:
> - tsk->oom_reaper_list = NULL;
> -
> /*
> * Hide this mm from OOM killer because it has been either reaped or
> * somebody can't call up_write(mmap_sem).
> */
> - set_bit(MMF_OOM_SKIP, &mm->flags);
> + if (!test_and_set_bit(MMF_OOM_SKIP, &mm->flags)) {
> + pr_info("oom_reaper: unable to reap pid:%d (%s)\n",
> + task_pid_nr(tsk), tsk->comm);
> + debug_show_all_locks();
> + }
> +

How does this _solve_ anything? Why would you even retry when you
_know_ that the reference count dropped to zero. It will never
increment. So the above is basically just schedule_timeout_idle(HZ/10) *
MAX_OOM_REAP_RETRIES before we set MMF_OOM_SKIP. This might be enough
for victim to finish the exit_mmap but it is more a hack^Wworkround
than anything else. You could very well do the sleep without any
obfuscation...
--
Michal Hocko
SUSE Labs