Re: [PATCH] oom: allow a non-CAP_SYS_RESOURCE proces to oom_score_adjdown

From: David Rientjes
Date: Sat Nov 13 2010 - 20:40:08 EST


On Fri, 12 Nov 2010, Mandeep Singh Baines wrote:

> We'd like to be able to oom_score_adj a process up/down as its
> enters/leaves the foreground. Currently, it is not possible to oom_adj
> down without CAP_SYS_RESOURCE. This patch allows a task to decrease
> its oom_score_adj back to the value that a CAP_SYS_RESOURCE thread set
> it or its inherited value at fork. Assuming the thread that has forked
> it has oom_score_adj of 0, each tab could decrease it back from 0 upon
> activation unless a CAP_SYS_RESOURCE thread elevated it to something
> higher.
>

oom_score_adj_min doesn't appear to be inherited at fork in your patch.

> Alternative considered:
>
> * a setuid binary
> * a daemon with CAP_SYS_RESOURCE
>
> Since you don't wan't all processes to be able to reduce their
> oom_adj, a setuid or daemon implementation would be complex. The
> alternatives also have much higher overhead.
>

This behavior should be documented in Documentation/filesystems/proc.txt.

> This patch updated based on feedback from
> David Rientjes <rientjes@xxxxxxxxxx>.
>
> Change-Id: If8f52363fd6c156e1730f43148aee987260e9c72

I know what a Change-Id is , but nobody else here does :)

> Signed-off-by: Mandeep Singh Baines <msb@xxxxxxxxxxxx>
> ---
> fs/proc/base.c | 4 +++-
> include/linux/sched.h | 2 ++
> 2 files changed, 5 insertions(+), 1 deletions(-)
>
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index f3d02ca..e617413 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -1164,7 +1164,7 @@ static ssize_t oom_score_adj_write(struct file *file, const char __user *buf,
> goto err_task_lock;
> }
>
> - if (oom_score_adj < task->signal->oom_score_adj &&
> + if (oom_score_adj < task->signal->oom_score_adj_min &&
> !capable(CAP_SYS_RESOURCE)) {
> err = -EACCES;
> goto err_sighand;
> @@ -1177,6 +1177,8 @@ static ssize_t oom_score_adj_write(struct file *file, const char __user *buf,
> atomic_dec(&task->mm->oom_disable_count);
> }
> task->signal->oom_score_adj = oom_score_adj;
> + if (capable(CAP_SYS_RESOURCE))
> + task->signal->oom_score_adj_min = oom_score_adj;
> /*
> * Scale /proc/pid/oom_adj appropriately ensuring that OOM_DISABLE is
> * always attainable.
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index f53cdf2..2a71ee0 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -626,6 +626,8 @@ struct signal_struct {
>
> int oom_adj; /* OOM kill score adjustment (bit shift) */
> int oom_score_adj; /* OOM kill score adjustment */
> + int oom_score_adj_min; /* OOM kill score adjustment minimum value.
> + * Only settable by CAP_SYS_RESOURCE. */
>
> struct mutex cred_guard_mutex; /* guard against foreign influences on
> * credential calculations
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/