Re: [ISSUE] mm: Add a user_ns owner to mm_struct and fix ptrace_may_access

From: Eric W. Biederman
Date: Mon Oct 24 2016 - 15:03:54 EST



Adding the containers list because that is the general place for these
kinds of discussions.

Cyrill Gorcunov <gorcunov@xxxxxxxxx> writes:

> Hi Eric! A few days ago we've noticed that our zombie00 test case started
> failing: https://ci.openvz.org/job/CRIU/view/All/job/CRIU-linux-next/406/console

> ---
> ======================== Run zdtm/static/zombie00 in h =========================
> Start test
> ./zombie00 --pidfile=zombie00.pid --outfile=zombie00.out
> Run criu dump
> Run criu restore
> Send the 15 signal to 30
> Wait for zdtm/static/zombie00(30) to die for 0.100000
> ################ Test zdtm/static/zombie00 FAIL at result check ################
>
> I've narrowed problem down to commit
>
> | From ce99dd5fd5f600f9f4f0d37bb8847c1cb7c6e4fc Mon Sep 17 00:00:00 2001
> | From: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
> | Date: Thu, 13 Oct 2016 21:23:16 -0500
> | Subject: [PATCH] mm: Add a user_ns owner to mm_struct and fix
> | ptrace_may_access
> |
> | During exec dumpable is cleared if the file that is being executed is
> | not readable by the user executing the file. A bug in
> | ptrace_may_access allows reading the file if the executable happens to
> | enter into a subordinate user namespace (aka clone(CLONE_NEWUSER),
> | unshare(CLONE_NEWUSER), or setns(fd, CLONE_NEWUSER).
>
> and the reason is that the zombie tasks do not have task::mm and in resut
> we're obtaining -EPERM when trying to read task->exit_code from
> /proc/pid/stat.

Hmm. As I read the code exit_code should be returned to userspace as a
0. It does not look to me as if userspace should see an error in
that case.

> Looking into commit I suspect when mm = NULL we've to move back the test
> ptrace_has_cap(__task_cred(task)->user_ns, mode)?

Maybe.

We might want to consider if these lines from do_task_stat make
any sense.

if (permitted)
seq_put_decimal_ll(m, " ", task->exit_code);
else
seq_puts(m, " 0");

Looking at the code. Nothing changes behavior except for privileged
tasks looking at processes without an mm. So yes it may be sane
to tweak that part of the check.

AKA in the in for-next branch the code currenty says:
mm = task->mm;
if (!mm ||
((get_dumpable(mm) != SUID_DUMP_USER) &&
!ptrace_has_cap(mm->user_ns, mode)))
return -EPERM;

And in the case there is no mm there is no way to get
past returning -EPERM.

Looking at why we use ptrace_may_access in the exit_code case
I see a couple of relevant commits.

The commit that added the exit code check:

commit f83ce3e6b02d5e48b3a43b001390e2b58820389d
Author: Jake Edge <jake@xxxxxxx>
Date: Mon May 4 12:51:14 2009 -0600

proc: avoid information leaks to non-privileged processes

By using the same test as is used for /proc/pid/maps and /proc/pid/smaps,
only allow processes that can ptrace() a given process to see information
that might be used to bypass address space layout randomization (ASLR).
These include eip, esp, wchan, and start_stack in /proc/pid/stat as well
as the non-symbolic output from /proc/pid/wchan.

ASLR can be bypassed by sampling eip as shown by the proof-of-concept
code at http://code.google.com/p/fuzzyaslr/ As part of a presentation
(http://www.cr0.org/paper/to-jt-linux-alsr-leak.pdf) esp and wchan were
also noted as possibly usable information leaks as well. The
start_stack address also leaks potentially useful information.

Cc: Stable Team <stable@xxxxxxxxxx>
Signed-off-by: Jake Edge <jake@xxxxxxx>
Acked-by: Arjan van de Ven <arjan@xxxxxxxxxxxxxxx>
Acked-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>


The change that started protecting start_code/end_code and
generally using these permissions to protect this class of information:

commit 5883f57ca0008ffc93e09cbb9847a1928e50c6f3
Author: Kees Cook <kees.cook@xxxxxxxxxxxxx>
Date: Wed Mar 23 16:42:53 2011 -0700

proc: protect mm start_code/end_code in /proc/pid/stat

While mm->start_stack was protected from cross-uid viewing (commit
f83ce3e6b02d5 ("proc: avoid information leaks to non-privileged
processes")), the start_code and end_code values were not. This would
allow the text location of a PIE binary to leak, defeating ASLR.

Note that the value "1" is used instead of "0" for a protected value since
"ps", "killall", and likely other readers of /proc/pid/stat, take
start_code of "0" to mean a kernel thread and will misbehave. Thanks to
Brad Spengler for pointing this out.

Addresses CVE-2011-0726

Signed-off-by: Kees Cook <kees.cook@xxxxxxxxxxxxx>
Cc: <stable@xxxxxxxxxx>
Cc: Alexey Dobriyan <adobriyan@xxxxxxxxx>
Cc: David Howells <dhowells@xxxxxxxxxx>
Cc: Eugene Teo <eugeneteo@xxxxxxxxx>
Cc: Martin Schwidefsky <schwidefsky@xxxxxxxxxx>
Cc: Brad Spengler <spender@xxxxxxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>


The commit that added task->exit_code:

commit 5b172087f99189416d5f47fd7ab5e6fb762a9ba3
Author: Cyrill Gorcunov <gorcunov@xxxxxxxxxx>
Date: Thu May 31 16:26:44 2012 -0700

c/r: procfs: add arg_start/end, env_start/end and exit_code members to /proc/$pid/stat

We would like to have an ability to restore command line arguments and
program environment pointers but first we need to obtain them somehow.
Thus we put these values into /proc/$pid/stat. The exit_code is needed to
restore zombie tasks.

Signed-off-by: Cyrill Gorcunov <gorcunov@xxxxxxxxxx>
Acked-by: Kees Cook <keescook@xxxxxxxxxxxx>
Cc: Pavel Emelyanov <xemul@xxxxxxxxxxxxx>
Cc: Serge Hallyn <serge.hallyn@xxxxxxxxxxxxx>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx>
Cc: Alexey Dobriyan <adobriyan@xxxxxxxxx>
Cc: Tejun Heo <tj@xxxxxxxxxx>
Cc: Andrew Vagin <avagin@xxxxxxxxxx>
Cc: Vasiliy Kulikov <segoon@xxxxxxxxxxxx>
Cc: Alexey Dobriyan <adobriyan@xxxxxxxxx>
Cc: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>


Looking at do_task_stat everything else that requires permitted
in do_tack_stat is an address. exit_code is something else so
I am not at all certain the ptrace_may_access permission check
makes sense.

A process without an mm is fundamentally undumpable so an error should
be returned in any case. So I don't see any harm in failing
ptrace_may_access in general. At the same time I can see how not
preserving the existing behavior is problematic.

So I am probably going to tweak the !mm case so that instead of failing
we perform the old capable check in that case. That seems the mot
certain way to avoid regressions. With that said, why is exit_code
behind a ptrace_may_access permission check?

Eric