Re: [kernel-hardening] Re: [PATCH] move RLIMIT_NPROC check fromset_user() to do_execve_common()

From: NeilBrown
Date: Thu Jul 21 2011 - 00:10:14 EST

On Fri, 15 Jul 2011 15:54:43 -0400 Stephen Smalley <sds@xxxxxxxxxxxxx> wrote:

> On Fri, 2011-07-15 at 19:26 +0400, Vasiliy Kulikov wrote:
> > Hi Stephen,
> >
> > On Fri, Jul 15, 2011 at 09:58 -0400, Stephen Smalley wrote:
> > > Does this have implications for Android's zygote model? There you have
> > > a long running uid 0 / all caps process (the zygote), which forks itself
> > > upon receiving a request to spawn an app and then calls
> >
> > > setgroups();
> > > setrlimit(); setgid(); setuid();
> >
> > Is RLIMIT_NPROC forced in your model and setuid() is expected to fail
> > because of NPROC exceeding? If no, then it is not touched at all.
> I don't know what their intent is. But it is an example of a case where
> moving the enforcement from setuid() to a subsequent execve() causes the
> check to never get applied. As to whether or not they care, I don't
> know. An app that calls fork() repeatedly will still be stopped, but an
> app that repeatedly connects to the zygote and asks to spawn another
> instance of itself would be unlimited.
> OTOH, the current RLIMIT_NPROC check and lack of setuid() error checking
> has been a repeated issue for Android.

So where does this leave us? Between a rock and a hard place?

It says to me that moving the check from set_user to execve is simply
Wrong(TM). It may be convenient and do TheRightThing in certain common
cases, but it also can do the Wrong thing in other cases and I don't think
that is an acceptable trade off.

Having setuid succeed when it should fail is simply incorrect.

The problem - as we all know - is that user space doesn't always check error
status properly. If we were to look for precedent I would point to SIGPIPE.
The only reason for that to exist is because programs don't always check that
a "write" succeeds so we have a mechanism to kill off processes that don't
check the error status and keep sending.

I would really like to apply that to this problem ... but that has already
been suggested (years ago) and found wanting. Maybe we can revisit it?

The value of the SIGPIPE approach (whether it is SIGPIPE or SIGXCPU or
SIGVEC ... if only there were a SIGXNPROC) is that the program remains in
complete control. It can find out if the NPROC limit has been exceeded at
the "right" time.

The only other solution that I can think of which isn't "Wrong(TM)" is my
first patch which introduced PF_SETUSER_FAILED.
With this patch setuid() still fails if it should, so the calling process
still remains in control. But if it fails to exercise that control, the
kernel steps in.

Vasiliy didn't like that because it allows a process to ignore the setuid
failure, perform some in-process activity as root when expecting it to be as
some-other-user, and only fails when execve is attempted - possibly too late.

Against this I ask: what exactly is our goal here?
Is it to stop all possible abuses? I think not. That is impossible.
Is it to stop certain known and commonly repeated errors? I think so. That
is clearly valuable.

We know (Thanks to Solar Designer's list) that unchecked setuid followed by
execve is a commonly repeated error, so trapping that can be justified.
Do we know that accessing the filesystem first is a commonly repeated error?
If not, there is no clear motive to deal with that problem.
If, however, it is then maybe a check for PF_SETUSER_FAILED in
inode_permission() would be the right thing.

Or maybe we just set PF_SETUSER_FAILED and leave it up to some security
module to decide what to disable or report when that is set?

In short: I don't think there can be a solution that is both completely
correct and completely safe. I would go for "as correct as possible" with
"closes common vulnerabilities".

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at