Re: [PATCH v3 1/2] exec: add PR_HIDE_SELF_EXE prctl

From: Colin Walters
Date: Mon Jan 23 2023 - 17:09:12 EST




On Mon, Jan 23, 2023, at 2:21 PM, Giuseppe Scrivano wrote:
> "Colin Walters" <walters@xxxxxxxxxx> writes:
>
>> On Fri, Jan 20, 2023, at 5:25 AM, Giuseppe Scrivano wrote:
>>> This patch adds a new prctl called PR_HIDE_SELF_EXE which allows
>>> processes to hide their own /proc/*/exe file. When this prctl is
>>> used, every access to /proc/*/exe for the calling process will
>>> fail with ENOENT.
>>
>> How about a mount option for procfs like `mount -t procfs procfs /proc -o rw,nosuid,nodev,magiclink-no-xdev`
>>
>> Where `magiclink-no-xdev` would cause all magic links to fail to cross a pid namespace or so?
>
> wouldn't that break also stuff like "/proc/self/fd/$FD" after you join a
> different PID namespace?

Ah, right. Hmm. But building on the reply to
https://lwn.net/Articles/920876/
maybe an opt-in flag like `-o magiclink-restricted=/proc/<pid>/ns/pid` that required callers to have CAP_SYS_ADMIN in the referenced namespace? Then things like crun/runc would open a fd for `/proc/self/ns/pid` when they start (usually, this is the init ns), then pass the reference to that fd into magiclink-restricted.

There may be a more elegant userspace API here; I think my overall point is reiterating what I mentioned in
https://github.com/containers/crun/pull/1105#issuecomment-1360085059

"A minor worry I have with both is that any namespacing-based approach that controls visibility in /proc runs the risk of someone adding a new way to get to the binary; maybe it's something like us having a fd for ourselves open under /proc/pid/fd/ or so."

So instead of hiding just /proc/self/exe, we add some sort of API that aims to restrict all other magic links that may be added in the future.

The history of map_files is interesting here; it looks like https://github.com/torvalds/linux/commit/bdb4d100afe9818aebd1d98ced575c5ef143456c introduces a comment that says

"* Only allow CAP_SYS_ADMIN to follow the links, due to concerns about how the
* symlinks may be used to bypass permissions on ancestor directories in the
* path to the file in question."

yet isn't there an inconsistency here in not applying the same restrictions on /proc/self/exe?

Or another way to look at this is that if we were to add some sort of API like this on /proc, we'd also change the proc_maps code to also honor it *instead* if present instead of limiting to the init ns?