Re: [patch] oom: give current access to memory reserves if it hasbeen killed

From: Oleg Nesterov
Date: Tue Mar 30 2010 - 11:49:16 EST


On 03/29, David Rientjes wrote:
>
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -681,6 +681,16 @@ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
> }
>
> /*
> + * If current has a pending SIGKILL, then automatically select it. The
> + * goal is to allow it to allocate so that it may quickly exit and free
> + * its memory.
> + */
> + if (fatal_signal_pending(current)) {
> + __oom_kill_task(current);

I am worried...

Note that __oom_kill_task() does force_sig(SIGKILL) which assumes that
->sighand != NULL. This is not true if out_of_memory() is called after
current has already passed exit_notify().


Hmm. looking at oom_kill.c... Afaics there are more problems with mt
apllications. select_bad_process() does for_each_process() which can
only see the group leaders. This is fine, but what if ->group_leader
has already exited? In this case its ->mm == NULL, and we ignore the
whole thread group.

IOW, unless I missed something, it is very easy to hide the process
from oom-kill:

int main()
{
pthread_create(memory_hog_func);
syscall(__NR_exit);
}



probably we need something like

--- x/mm/oom_kill.c
+++ x/mm/oom_kill.c
@@ -246,21 +246,27 @@ static enum oom_constraint constrained_a
static struct task_struct *select_bad_process(unsigned long *ppoints,
struct mem_cgroup *mem)
{
- struct task_struct *p;
+ struct task_struct *g, *p;
struct task_struct *chosen = NULL;
struct timespec uptime;
*ppoints = 0;

do_posix_clock_monotonic_gettime(&uptime);
- for_each_process(p) {
+ for_each_process(g) {
unsigned long points;

/*
* skip kernel threads and tasks which have already released
* their mm.
*/
+ p = g;
+ do {
+ if (p->mm)
+ break;
+ } while_each_thread(g, p);
if (!p->mm)
continue;
+
/* skip the init task */
if (is_global_init(p))
continue;

except is should be simplified and is_global_init() should check g.

No?


Oh... proc_oom_score() is racy. We can't trust ->group_leader even
under tasklist_lock. If we race with exit/exec it can point to
nowhere. I'll send the simple fix.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/