Re: [PATCH 1/1] mm, oom_adj: don't loop through tasks in __set_oom_adj when not necessary

From: Oleg Nesterov
Date: Thu Aug 20 2020 - 10:36:50 EST


On 08/20, Oleg Nesterov wrote:
>
> On 08/20, Eric W. Biederman wrote:
> >
> > --- a/fs/exec.c
> > +++ b/fs/exec.c
> > @@ -1139,6 +1139,10 @@ static int exec_mmap(struct mm_struct *mm)
> > vmacache_flush(tsk);
> > task_unlock(tsk);
> > if (old_mm) {
> > + mm->oom_score_adj = old_mm->oom_score_adj;
> > + mm->oom_score_adj_min = old_mm->oom_score_adj_min;
> > + if (tsk->vfork_done)
> > + mm->oom_score_adj = tsk->vfork_oom_score_adj;
>
> too late, ->vfork_done is NULL after mm_release().
>
> And this can race with __set_oom_adj(). Yes, the current code is racy too,
> but this change adds another race, __set_oom_adj() could already observe
> ->mm != NULL and update mm->oom_score_adj.
^^^^^^^^^^^^

I meant ->mm == new_mm.

And another problem. Suppose we have

if (!vfork()) {
change_oom_score();
exec();
}

the parent can be killed before the child execs, in this case vfork_oom_score_adj
will be lost.

Oleg.