Re: [patch] oom: give current access to memory reserves if it hasbeen killed

From: David Rientjes
Date: Thu Apr 08 2010 - 17:08:48 EST


On Thu, 1 Apr 2010, Oleg Nesterov wrote:

> > > > > Say, oom_forkbomb_penalty() does list_for_each_entry(tsk->children).
> > > > > Again, this is not right even if we forget about !child->mm check.
> > > > > This list_for_each_entry() can only see the processes forked by the
> > > > > main thread.
> > > > >
> > > >
> > > > That's the intention.
> > >
> > > Why? shouldn't oom_badness() return the same result for any thread
> > > in thread group? We should take all childs into account.
> > >
> >
> > oom_forkbomb_penalty() only cares about first-descendant children that
> > do not share the same memory,
>
> I see, but the code doesn't really do this. I mean, it doesn't really
> see the first-descendant children, only those which were forked by the
> main thread.
>
> Look. We have a main thread M and the sub-thread T. T forks a lot of
> processes which use a lot of memory. These processes _are_ the first
> descendant children of the M+T thread group, they should be accounted.
> But M->children list is empty.
>
> oom_forkbomb_penalty() and oom_kill_process() should do
>
> t = tsk;
> do {
> list_for_each_entry(child, &t->children, sibling) {
> ... take child into account ...
> }
> } while_each_thread(tsk, t);
>
>

In this case, it seems more appropriate that we would penalize T and not M
since it's not necessarily responsible for the behavior of the children it
forks. T is the buggy/malicious program, not M.

> See the patch below. Yes, this is minor, but it is always good to avoid
> the unnecessary locks, and thread_group_cputime() is O(N).
>
> Not only for performance reasons. This allows to change the locking in
> thread_group_cputime() if needed without fear to deadlock with task_lock().
>
> Oleg.
>
> --- x/mm/oom_kill.c
> +++ x/mm/oom_kill.c
> @@ -97,13 +97,16 @@ static unsigned long oom_forkbomb_penalt
> return 0;
> list_for_each_entry(child, &tsk->children, sibling) {
> struct task_cputime task_time;
> - unsigned long runtime;
> + unsigned long runtime, this_rss;
>
> task_lock(child);
> if (!child->mm || child->mm == tsk->mm) {
> task_unlock(child);
> continue;
> }
> + this_rss = get_mm_rss(child->mm);
> + task_unlock(child);
> +
> thread_group_cputime(child, &task_time);
> runtime = cputime_to_jiffies(task_time.utime) +
> cputime_to_jiffies(task_time.stime);
> @@ -113,10 +116,9 @@ static unsigned long oom_forkbomb_penalt
> * get to execute at all in such cases anyway.
> */
> if (runtime < HZ) {
> - child_rss += get_mm_rss(child->mm);
> + child_rss += this_rss;
> forkcount++;
> }
> - task_unlock(child);
> }
>
> /*

This patch looks good, will you send it to Andrew with a changelog and
sign-off line? Also feel free to add:

Acked-by: David Rientjes <rientjes@xxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/