Re: [PATCH] copy over oom_adj value at fork time

From: Paul Menage
Date: Fri Jul 17 2009 - 16:04:12 EST


On Fri, Jul 17, 2009 at 2:34 AM, David Rientjes<rientjes@xxxxxxxxxx> wrote:
>
> The only way to workaround that is by using the highest oom_adj user for
> the mm_struct from the array in reporting /proc/pid/oom_score, as well.

That sounds fine to me.

> But that would lead to /proc/pid/oom_adj not affecting oom_score at all,
> which isn't consistent.

Isn't consistent with what? It's perfectly consistent with saying "the
oom_score of a task is based on the highest oom_adj value of any task
sharing the same mm". Admittedly it's not 100% consistent with the old
semantics, but I'm having trouble imagining a scenario where someone
was relying on the changed semantics.

But taking a completely different approach, is there a reason that we
couldn't have just moved the do_each_thread() check for OOM_DISABLED
out of oom_kill_task() and into select_bad_process() at the point
where we've decided that the thread in question is a better victim
than the current victim? That would fix the OOM livelock while
allowing us to keep exactly the same oom_adj/oom_score semantics as in
previous kernels.

> The inheritance issue should be fixed with Rik's patch with the exception
> of vfork -> change /proc/pid-of-child/oom_adj -> execve.  If scripts were
> written to do that with the old behavior, they'll have to adjust to change
> oom_adj _after_ the execve

Think about what you're suggesting here - execve() replaces your code
with whatever you're execing, so unless that code is also written to
handle oom_adj (which for something like a generic job scheduler, the
exec'd code is unlikely to do) you're stuck.

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/