Re: [PATCH 1/1] mm, oom_adj: don't loop through tasks in __set_oom_adj when not necessary

From: Eric W. Biederman
Date: Thu Aug 20 2020 - 10:47:16 EST


Oleg Nesterov <oleg@xxxxxxxxxx> writes:

> On 08/20, Eric W. Biederman wrote:
>>
>> --- a/fs/exec.c
>> +++ b/fs/exec.c
>> @@ -1139,6 +1139,10 @@ static int exec_mmap(struct mm_struct *mm)
>> vmacache_flush(tsk);
>> task_unlock(tsk);
>> if (old_mm) {
>> + mm->oom_score_adj = old_mm->oom_score_adj;
>> + mm->oom_score_adj_min = old_mm->oom_score_adj_min;
>> + if (tsk->vfork_done)
>> + mm->oom_score_adj = tsk->vfork_oom_score_adj;
>
> too late, ->vfork_done is NULL after mm_release().

Good point.

> And this can race with __set_oom_adj(). Yes, the current code is racy too,
> but this change adds another race, __set_oom_adj() could already observe
> ->mm != NULL and update mm->oom_score_adj.

I am not certain about races but we should be able to do something like:

in exec_mmap:
if (old_mm) {
mm->oom_score_adj = old_mm->oom_score_adj;
mm->oom_score_adj_min = old_mm->oom_score_adj_min;
if (tsk->signal->vfork_oom_score_adj_set) {
mm->oom_score_adj = tsk->vfork_oom_score_adj;
tsk->signal->vfork_oom_score_adj_set = false;
}
}

in __set_oom_adj:
if (mm) {
mm->oom_score_adj = oom_adj;
tsk->signal->vfork_oom_score_adj_set = false;
} else {
tsk->vfork_score_adj = old_mm->oom_score_adj;
tsk->signal->vfork_oom_score_adj_set = true;
}

There might even be a special oom_score_adj value we can use instead of
a separate flag. I am just not familiar enough with oom_score_adj to know.

We should be able to do something like that where we know the value is
set and only use it if so. And a subsequent _set_oom_adj without
observing vfork_done set will clear the value in signal_struct.

We have to be a bit careful to get the details right but it should be
straight forward.


Michal also has a point about oom_score_adj_min, and I really don't
understand the oom logic value well enough to guess how that should
work.


Although to deal with some of the races it probably only makes sense
to call complete_vfork_done in exec after the new mm has been installed,
and while exec_update_mutex is held. I don't think anyone every
anticipated using vfork_done as a flag.

Eric