Re: [PATCH] mm/oom_kill.c: don't kill TASK_UNINTERRUPTIBLE tasks

From: Tetsuo Handa
Date: Thu Sep 24 2015 - 07:50:11 EST


Kyle Walker wrote:
> I agree, in lieu of treating TASK_UNINTERRUPTIBLE tasks as unkillable,
> and omitting them from the oom selection process, continuing the
> carnage is likely to result in more unpredictable results. At this
> time, I believe Oleg's solution of zapping the process memory use
> while it sleeps with the fatal signal enroute is ideal.

I cannot help thinking about the worst case.

(1) If memory zapping code successfully reclaimed some memory from
the mm struct used by the OOM victim, what guarantees that the
reclaimed memory is used by OOM victims (and processes which
are blocking OOM victims)?

David's "global access to memory reserves" allows a local unprivileged
user to deplete memory reserves; could allow that user to deplete the
reclaimed memory as well.

I think that my "Favor kthread and dying threads over normal threads"
( http://lkml.kernel.org/r/1442939668-4421-1-git-send-email-penguin-kernel@xxxxxxxxxxxxxxxxxxx )
would allow the reclaimed memory to be used by OOM victims and kernel
threads if the reclaimed memory is added to free list bit by bit
in a way that watermark remains low enough to prevent normal threads
from allocating the reclaimed memory.

But my patch still fails if normal threads are blocking the OOM
victims or unrelated kernel threads consume the reclaimed memory.

(2) If memory zapping code failed to reclaim enough memory from the mm
struct needed for the OOM victim, what mechanism can solve the OOM
stalls?

Some administrator sets /proc/pid/oom_score_adj to -1000 to most of
enterprise processes (e.g. java) and as a consequence only trivial
processes (e.g. grep / sed) are candidates for OOM victims.

Moreover, a local unprivileged user can easily fool the OOM killer using
decoy tasks (which consumes little memory and /proc/pid/oom_score_adj is
set to 999).

(3) If memory zapping code reclaimed no memory due to ->mmap_sem contention,
what mechanism can solve the OOM stalls?

While we don't allocate much memory with ->mmap_sem held for writing,
the task which is holding ->mmap_sem for writing can be chosen as
one of OOM victims. If such task receives SIGKILL but TIF_MEMDIE is not
set, it can form OOM-livelock unless all memory allocations with
->mmap_sem held for writing are __GFP_FS allocations and that task can
reach out_of_memory() (i.e. not blocked by unexpected factors such as
waiting for filesystem's writeback).

After all I think we have to consider what to do if memory zapping code
failed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/