[PATCH] oom: allow a non-CAP_SYS_RESOURCE proces to oom_score_adjdown

From: Mandeep Singh Baines
Date: Fri Nov 12 2010 - 19:47:16 EST


We'd like to be able to oom_score_adj a process up/down as its
enters/leaves the foreground. Currently, it is not possible to oom_adj
down without CAP_SYS_RESOURCE. This patch allows a task to decrease
its oom_score_adj back to the value that a CAP_SYS_RESOURCE thread set
it or its inherited value at fork. Assuming the thread that has forked
it has oom_score_adj of 0, each tab could decrease it back from 0 upon
activation unless a CAP_SYS_RESOURCE thread elevated it to something
higher.

Alternative considered:

* a setuid binary
* a daemon with CAP_SYS_RESOURCE

Since you don't wan't all processes to be able to reduce their
oom_adj, a setuid or daemon implementation would be complex. The
alternatives also have much higher overhead.

This patch updated based on feedback from
David Rientjes <rientjes@xxxxxxxxxx>.

Change-Id: If8f52363fd6c156e1730f43148aee987260e9c72
Signed-off-by: Mandeep Singh Baines <msb@xxxxxxxxxxxx>
---
fs/proc/base.c | 4 +++-
include/linux/sched.h | 2 ++
2 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index f3d02ca..e617413 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1164,7 +1164,7 @@ static ssize_t oom_score_adj_write(struct file *file, const char __user *buf,
goto err_task_lock;
}

- if (oom_score_adj < task->signal->oom_score_adj &&
+ if (oom_score_adj < task->signal->oom_score_adj_min &&
!capable(CAP_SYS_RESOURCE)) {
err = -EACCES;
goto err_sighand;
@@ -1177,6 +1177,8 @@ static ssize_t oom_score_adj_write(struct file *file, const char __user *buf,
atomic_dec(&task->mm->oom_disable_count);
}
task->signal->oom_score_adj = oom_score_adj;
+ if (capable(CAP_SYS_RESOURCE))
+ task->signal->oom_score_adj_min = oom_score_adj;
/*
* Scale /proc/pid/oom_adj appropriately ensuring that OOM_DISABLE is
* always attainable.
diff --git a/include/linux/sched.h b/include/linux/sched.h
index f53cdf2..2a71ee0 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -626,6 +626,8 @@ struct signal_struct {

int oom_adj; /* OOM kill score adjustment (bit shift) */
int oom_score_adj; /* OOM kill score adjustment */
+ int oom_score_adj_min; /* OOM kill score adjustment minimum value.
+ * Only settable by CAP_SYS_RESOURCE. */

struct mutex cred_guard_mutex; /* guard against foreign influences on
* credential calculations
--
1.7.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/