Re: [PATCH] capabilities: require CAP_SETFCAP to map uid 0 (v3.2)

From: Christian Brauner
Date: Sun Apr 18 2021 - 13:21:37 EST


On Sat, Apr 17, 2021 at 03:04:34PM -0500, Serge Hallyn wrote:
> A process running as uid 0 but without cap_setfcap currently can simply
> unshare a new user namespace with uid 0 mapped to 0. While this task
> will not have new capabilities against the parent namespace, there is
> a loophole due to the way namespaced file capabilities work. File
> capabilities valid in userns 1 are distinguised from file capabilities
> valid in userns 2 by the kuid which underlies uid 0. Therefore
> the restricted root process can unshare a new self-mapping namespace,
> add a namespaced file capability onto a file, then use that file
> capability in the parent namespace.
>
> To prevent that, do not allow mapping uid 0 if the process which
> opened the uid_map file does not have CAP_SETFCAP, which is the capability
> for setting file capabilities.
>
> A further wrinkle: a task can unshare its user namespace, then
> open its uid_map file itself, and map (only) its own uid. In this
> case we do not have the credential from before unshare, which was
> potentially more restricted. So, when creating a user namespace, we
> record whether the creator had CAP_SETFCAP. Then we can use that
> during map_write().
>
> With this patch:
>
> 1. unprivileged user can still unshare -Ur
>
> ubuntu@caps:~$ unshare -Ur
> root@caps:~# logout
>
> 2. root user can still unshare -Ur
>
> ubuntu@caps:~$ sudo bash
> root@caps:/home/ubuntu# unshare -Ur
> root@caps:/home/ubuntu# logout
>
> 3. root user without CAP_SETFCAP cannot unshare -Ur:
>
> root@caps:/home/ubuntu# /sbin/capsh --drop=cap_setfcap --
> root@caps:/home/ubuntu# /sbin/setcap cap_setfcap=p /sbin/setcap
> unable to set CAP_SETFCAP effective capability: Operation not permitted
> root@caps:/home/ubuntu# unshare -Ur
> unshare: write failed /proc/self/uid_map: Operation not permitted
>
> Signed-off-by: Serge Hallyn <serge@xxxxxxxxxx>
>
> Changelog:
> * fix logic in the case of writing to another task's uid_map
> * rename 'ns' to 'map_ns', and make a file_ns local variable
> * use /* comments */
> * update the CAP_SETFCAP comment in capability.h
> * rename parent_unpriv to parent_can_setfcap (and reverse the
> logic)
> * remove printks
> * clarify (i hope) the code comments
> * update capability.h comment
> * renamed parent_can_setfcap to parent_could_setfcap
> * made the check its own disallowed_0_mapping() fn
> * moved the check into new_idmap_permitted
> * rename disallowed_0_mapping to verify_root_mapping
> * change verify_root_mapping to Christian's suggested flow
> ---

Thank you. This looks good. I tested this with:

- fstests
- LXD testsuite
- Podman testsuite
- libcap testsuite

Tested-by: Christian Brauner <christian.brauner@xxxxxxxxxx>
Reviewed-by: Christian Brauner <christian.brauner@xxxxxxxxxx>