Re: can't oom-kill zap the victim's memory?

From: Oleg Nesterov
Date: Tue Sep 22 2015 - 08:46:13 EST


On 09/22, Tetsuo Handa wrote:
>
> I imagined a dedicated kernel thread doing something like shown below.
> (I don't know about mm->mmap management.)
> mm->mmap_zapped corresponds to MMF_MEMDIE.

No, it doesn't, please see below.

> bool has_sigkill_task;
> wait_queue_head_t kick_mm_zapper;

OK, if this kthread is kicked by oom this makes more sense, but still
doesn't look right at least initially.

Let me repeat, I do think we need MMF_MEMDIE or something like it before
we do something more clever. And in fact I think this flag makes sense
regardless.

> static void mm_zapper(void *unused)
> {
> struct task_struct *g, *p;
> struct mm_struct *mm;
>
> sleep:
> wait_event(kick_remover, has_sigkill_task);
> has_sigkill_task = false;
> restart:
> rcu_read_lock();
> for_each_process_thread(g, p) {
> if (likely(!fatal_signal_pending(p)))
> continue;
> task_lock(p);
> mm = p->mm;
> if (mm && mm->mmap && !mm->mmap_zapped && down_read_trylock(&mm->mmap_sem)) {
^^^^^^^^^^^^^^^

We do not want mm->mmap_zapped, it can't work. We need mm->needs_zap
set by oom_kill_process() and cleared after zap_page_range().

Because otherwise we can not handle CLONE_VM correctly. Suppose that
an innocent process P does vfork() and the child is killed but not
exited yet. mm_zapper() can find the child, do zap_page_range(), and
surprise its alive parent P which uses the same ->mm.

And if we rely on MMF_MEMDIE or mm->needs_zap or whaveter then
for_each_process_thread() doesn't really make sense. And if we have
a single MMF_MEMDIE process (likely case) then the unconditional
_trylock is suboptimal.

Tetsuo, can't we do something simple which "obviously can't hurt at
least" and then discuss the potential improvements?

And yes, yes, the "Kill all user processes sharing victim->mm" logic
in oom_kill_process() doesn't 100% look right, at least wrt the change
we discuss.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/