Re: [PATCH v4 2/2] procfs/tasks: add a simple per-task procfs hidepid= field

From: Djalal Harouni
Date: Wed Jan 18 2017 - 17:50:32 EST


On Tue, Jan 17, 2017 at 9:33 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
> On Mon, Jan 16, 2017 at 9:15 AM, Djalal Harouni <tixxdz@xxxxxxxxx> wrote:
>> Cc linux-api
>>
>> On Mon, Jan 16, 2017 at 2:23 PM, Djalal Harouni <tixxdz@xxxxxxxxx> wrote:
>>>
>>> From: Djalal Harouni <tixxdz@xxxxxxxxx>
>>>
>>> This adds a new per-task hidepid= flag that is honored by procfs when
>>> presenting /proc to the user, in addition to the existing hidepid= mount
>>> option. So far, hidepid= was exclusively a per-pidns setting. Locking
>>> down a set of processes so that they cannot see other user's processes
>>> without affecting the rest of the system thus currently requires
>>> creation of a private PID namespace, with all the complexity it brings,
>>> including maintaining a stub init process as PID 1 and losing the
>>> ability to see processes of the same user on the rest of the system.
>>>
>>> With this patch all acesss and visibility checks in procfs now
>>> honour two fields:
>>>
>>> a) the existing hide_pid field in the PID namespace
>>> b) the new hide_pid in struct task_struct
>>>
>>> Access/visibility is only granted if both fields permit it; the more
>>> restrictive one wins. By default the new task_struct hide_pid value
>>> defaults to 0, which means behaviour is not changed from the status quo.
>>>
>>> Setting the per-process hide_pid value is done via a new PR_SET_HIDEPID
>>> prctl() option which takes the same three supported values as the
>>> hidepid= mount option. The per-process hide_pid may only be increased,
>>> never decreased, thus ensuring that once applied, processes can never
>>> escape such a hide_pid jail. When a process forks it inherits its
>>> parent's hide_pid value.
>>>
>>> Suggested usecase: let's say nginx runs as user "www-data". After
>>> dropping privileges it may now call:
>>>
>>> â
>>> prctl(PR_SET_HIDEPID, 2);
>>> â
>>>
>>> And from that point on neither nginx itself, nor any of its child
>>> processes may see processes in /proc anymore that belong to a different
>>> user than "www-data". Other services running on the same system remain
>>> unaffected.
>
> What affect, if any, does this have on ptrace() permissions?

This should not affect ptrace() permissions or other system calls that
work directly on pids, the test in procfs is related to inodes before
the ptrace check, hmm what do you have in mind ?


> Also, this one-way thing seems wrong to me. I think it should roughly
> follow the no_new_privs rules instead. IOW, if you unshare your
> pidns, it gets cleared. Also, maybe you shouldn't be able to set it

Andy I don't follow here, no_new_privs is never cleared right ? I
can't see the corresponding clear bit code for it.

For this one I want it to act like no_new_privs. Also pidns can be
created with userns which means it can be revoked. For my use case I
want it to be part of *one* single operation where it is set with the
other sandbox operations that are all preserved... instead of setting
it *again* each time where it can already be late.


> without either having CAP_SYS_ADMIN over your userns or having
> no_new_privs set.

For this one I can add it sure. Historically that logic was added to
make seccomp more usable, for this patch the values can't be relaxed,
they are always increased never decreased. However one minor advantage
if you require no_new_privs is that this option hidepid will also
assert that you can't setuid to access some procfs inodes... though
you can also just set 'no_new_privs + hidepid' both of them in any
order. Also it allows unprivileged without userns to setup a minimal
jail while performing some operations that can be blocked by
no_new_privs.

Andy, Kees any other comments please on it ? I'm not sure if overusing
no_new_privs in this case is a good idea. Seems to me that seccomp +
no_new_privs is different than this hidepid feature that overlaps
nicely with no_new_privs.

If there are no responses for this question, then I will just add the
"CAP_SYS_ADMIN || no_new_privs" test in the next iteration.

> --Andy

Thanks!

--
tixxdz
http://opendz.org