[patch 00/11] userspace out of memory handling

From: David Rientjes
Date: Tue Mar 04 2014 - 22:58:51 EST


This patchset implements userspace out of memory handling.

It is based on v3.14-rc5. Individual patches will apply cleanly or you
may pull the entire series from

git://git.kernel.org/pub/scm/linux/kernel/git/rientjes/linux.git mm/oom

When the system or a memcg is oom, processes running on that system or
attached to that memcg cannot allocate memory. It is impossible for a
process to reliably handle the oom condition from userspace.

First, consider only system oom conditions. When memory is completely
depleted and nothing may be reclaimed, the kernel is forced to free some
memory; the only way it can do so is to kill a userspace process. This
will happen instantaneously and userspace can enforce neither its own
policy nor collect information.

On system oom, there may be a hierarchy of memcgs that represent user
jobs, for example. Each job may have a priority independent of their
current memory usage. There is no existing kernel interface to kill the
lowest priority job; userspace can now kill the lowest priority job or
allow priorities to change based on whether the job is using more memory
than its pre-defined reservation.

Additionally, users may want to log the condition or debug applications
that are using too much memory. They may wish to collect heap profiles
or are able to do memory freeing without killing a process by throttling
or ratelimiting.

Interactive users using X window environments may wish to have a dialogue
box appear to determine how to proceed -- it may even allow them shell
access to examine the state of the system while oom.

It's not sufficient to simply restrict all user processes to a subset of
memory and oom handling processes to the remainder via a memcg hierarchy:
kernel memory and other page allocations can easily deplete all memory
that is not charged to a user hierarchy of memory.

This patchset allows userspace to do all of these things by defining a
small memory reserve that is accessible only by processes that are
handling the notification.

Second, consider memcg oom conditions. Processes need no special
knowledge of whether they are attached to the root memcg, where memcg
charging will always succeed, or a child memcg where charging will fail
when the limit has been reached. This allows those processes handling
memcg oom conditions to overcharge the memcg by the amount of reserved
memory. They need not create child memcgs with smaller limits and
attach the userspace oom handler only to the parent; such support would
not allow userspace to handle system oom conditions anyway.

This patchset introduces a standard interface through memcg that allows
both of these conditions to be handled in the same clean way: users
define memory.oom_reserve_in_bytes to define the reserve and this
amount is allowed to be overcharged to the process handling the oom
condition's memcg. If used with the root memcg, this amount is allowed
to be allocated below the per-zone watermarks for root processes that
are handling such conditions (only root may write to
cgroup.event_control for the root memcg).
---
Documentation/cgroups/memory.txt | 46 ++++++++-
Documentation/cgroups/resource_counter.txt | 12 +--
Documentation/sysctl/vm.txt | 5 +
arch/m32r/mm/discontig.c | 1 +
include/linux/memcontrol.h | 24 +++++
include/linux/mempolicy.h | 3 +-
include/linux/mmzone.h | 2 +
include/linux/res_counter.h | 16 ++--
include/linux/sched.h | 2 +-
kernel/fork.c | 13 +--
kernel/res_counter.c | 42 ++++++---
mm/memcontrol.c | 144 ++++++++++++++++++++++++++++-
mm/mempolicy.c | 46 ++-------
mm/oom_kill.c | 7 ++
mm/page_alloc.c | 17 +++-
mm/slab.c | 8 +-
mm/slub.c | 2 +-
17 files changed, 292 insertions(+), 98 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/