Re: [PATCH v3 1/2] exec: add PR_HIDE_SELF_EXE prctl

From: Christian Brauner
Date: Thu Jan 26 2023 - 03:26:09 EST


On Thu, Jan 26, 2023 at 02:28:47AM +1100, Aleksa Sarai wrote:
> On 2023-01-24, Giuseppe Scrivano <gscrivan@xxxxxxxxxx> wrote:
> > Aleksa Sarai <cyphar@xxxxxxxxxx> writes:
> >
> > > On 2023-01-20, Giuseppe Scrivano <gscrivan@xxxxxxxxxx> wrote:
> > >> This patch adds a new prctl called PR_HIDE_SELF_EXE which allows
> > >> processes to hide their own /proc/*/exe file. When this prctl is
> > >> used, every access to /proc/*/exe for the calling process will
> > >> fail with ENOENT.
> > >>
> > >> This is useful for preventing issues like CVE-2019-5736, where an
> > >> attacker can gain host root access by overwriting the binary
> > >> in OCI runtimes through file-descriptor mishandling in containers.
> > >>
> > >> The current fix for CVE-2019-5736 is to create a read-only copy or
> > >> a bind-mount of the current executable, and then re-exec the current
> > >> process. With the new prctl, the read-only copy or bind-mount copy is
> > >> not needed anymore.
> > >>
> > >> While map_files/ also might contain symlinks to files in host,
> > >> proc_map_files_get_link() permissions checks are already sufficient.
> > >
> > > I suspect this doesn't protect against the execve("/proc/self/exe")
> > > tactic (because it clears the bit on execve), so I'm not sure this is
> > > much safer than PR_SET_DUMPABLE (yeah, it stops root in the source
> > > userns from accessing /proc/$pid/exe but the above attack makes that no
> > > longer that important).
> >
> > it protects against that attack too. It clears the bit _after_ the
> > execve() syscall is done.
> >
> > If you attempt execve("/proc/self/exe") you still get ENOENT:
> >
> > ```
> > #include <stdlib.h>
> > #include <stdio.h>
> > #include <sys/prctl.h>
> > #include <unistd.h>
> >
> > int main(void)
> > {
> > int ret;
> >
> > ret = prctl(65, 1, 0, 0, 0);
> > if (ret != 0)
> > exit(1);
> >
> > execl("/proc/self/exe", "foo", NULL);
> > exit(2);
> > }
> > ```
> >
> > # strace -e prctl,execve ./hide-self-exe
> > execve("./hide-self-exe", ["./hide-self-exe"], 0x7fff975a3690 /* 39 vars */) = 0
> > prctl(0x41 /* PR_??? */, 0x1, 0, 0, 0) = 0
> > execve("/proc/self/exe", ["foo"], 0x7ffcf51868b8 /* 39 vars */) = -1 ENOENT (No such file or directory)
> > +++ exited with 2 +++
> >
> > I've also tried execv'ing with a script that uses "#!/proc/self/exe" and
> > I get the same ENOENT.
>
> Ah, you're right. As you mentioned, you could still do the attack
> through /proc/self/map_files but that would require you to know where
> the binary will be located (and being non-dumpable blocks container
> processes from doing tricks to get the right path).
>
> I wonder if we should somehow require (or auto-apply) SUID_DUMP_NONE
> when setting this prctl, since it does currently depend on it to be
> properly secure...
>
> > > I think the only way to fix this properly is by blocking re-opens of
> > > magic links that have more permissions than they originally did. I just
> > > got back from vacation, but I'm working on fixing up [1] so it's ready
> > > to be an RFC so we can close this hole once and for all.
> >
> > so that relies on the fact opening /proc/self/exe with O_WRONLY fails
> > with ETXTBSY?
>
> Not quite, it relies on the fact that /proc/self/exe (and any other
> magiclink to /proc/self/exe) does not have a write mode (semantically,
> because of -ETXTBSY) and thus blocks any attempt to open it (or re-open
> it) with a write mode. It also fixes some other possible issues and lets
> you have upgrade masks (a-la capabilities) to file descriptors.
>
> Ultimately I think having a complete "no really, nobody can touch this"
> knob is also a good idea, and as this is is much simpler we can it in
> much quicker than the magiclink stuff (which I still think is necessary
> in general).

It definitely but let's not tie our generic vfs apis to this problem.