Re: [PATCH v3 1/2] exec: add PR_HIDE_SELF_EXE prctl

From: Giuseppe Scrivano
Date: Tue Feb 28 2023 - 09:20:25 EST


Andrei Vagin <avagin@xxxxxxxxx> writes:

> On Mon, Jan 30, 2023 at 11:06:02AM +0100, Christian Brauner wrote:
>> On Mon, Jan 30, 2023 at 10:53:31AM +0100, Christian Brauner wrote:
>> > On Sun, Jan 29, 2023 at 01:12:45PM -0500, Colin Walters wrote:
>> > >
>> > >
>> > > On Sun, Jan 29, 2023, at 11:58 AM, Christian Brauner wrote:
>> > > > On Sun, Jan 29, 2023 at 08:59:32AM -0500, Colin Walters wrote:
>> > > >>
>> > > >>
>> > > >> On Wed, Jan 25, 2023, at 11:30 AM, Giuseppe Scrivano wrote:
>> > > >> >
>> > > >> > After reading some comments on the LWN.net article, I wonder if
>> > > >> > PR_HIDE_SELF_EXE should apply to CAP_SYS_ADMIN in the initial user
>> > > >> > namespace or if in this case root should keep the privilege to inspect
>> > > >> > the binary of a process. If a container runs with that many privileges
>> > > >> > then it has already other ways to damage the host anyway.
>> > > >>
>> > > >> Right, that's what I was trying to express with the "make it
>> > > >> work the same as map_files". Hiding the entry entirely even
>> > > >> for initial-namespace-root (real root) seems like it's going
>> > > >> to potentially confuse profiling/tracing/debugging tools for
>> > > >> no good reason.
>> > > >
>> > > > If this can be circumvented via CAP_SYS_ADMIN
>> > >
>> > > To be clear, I'm proposing CAP_SYS_ADMIN in the current user
>> > > namespace at the time of the prctl(). (Or if keeping around a
>> > > reference just for this is too problematic, perhaps hardcoding
>> > > to the init ns)
>> >
>> > Oh no, I fully understand. The point was that the userspace fix protects
>> > even against attackers with CAP_SYS_ADMIN in init_user_ns. And that was
>> > important back then and is still relevant today for some workloads.
>> >
>> > For unprivileged containers where host and container are separate by a
>> > meaningful user namespace boundary this whole mitigation is irrelevant
>> > as the binary can't be overwritten.
>> >
>> > >
>> > > A process with CAP_SYS_ADMIN in a child namespace would still not be able to read the binary.
>> > >
>> > > > then this mitigation
>> > > > becomes immediately way less interesting because the userspace
>> > > > mitigation we came up with protects against CAP_SYS_ADMIN as well
>> > > > without any regression risk.
>> > >
>> > > The userspace mitigation here being "clone self to memfd"? But that's a sufficiently ugly workaround that it's created new problems; see https://lwn.net/Articles/918106/
>> >
>> > But this is a problem with the memfd api not with the fix. Following the
>> > thread the ability to create executable memfds will stay around. As it
>> > should be given how long this has been supported. And they have backward
>> > compatibility in mind which is great.
>>
>> Following up from yesterday's promise to check with the criu org I'm
>> part of: this is going to break criu unforunately as it dumps (and
>> restores) /proc/self/exe. Even with an escape hatch we'd still risk
>> breaking it. Whereas again, the memfd solution doesn't cause those
>> issues.
>>
>> Don't get me wrong it's pretty obvious that I was pretty supportive of
>> this fix especially because it looked rather simple but this is turning
>> out to be less simple than we tought. I don't think that this is worth
>> it given the functioning fixes we already have.
>
> btw: can we use PR_SET_MM_EXE_FILE or PR_SET_MM_MAP (prctl_map.exe_fd) to
> set a dummy exe. Will it have the required effect?

if I am understanding it correctly, that seems a bit more complicated,
we first need to unmap the current executable and then replace it with
its copy?

Creating the dummy exe could also be a problem since we need a new copy
each time we want to hide the executable.

Or are you suggesting using it differently?

Thanks,
Giuseppe

>> The good thing is that - even if it will take a longer - that Aleksa's
>> patchset will provide a more general solution by making it possible for
>> runc/crun/lxc to open the target binary with a restricted upgrade mask
>> making it impossible to open the binary read-write again. This won't
>> break criu and will fix this issue and is generally useful.