Re: [PATCH 7/8] cgroup: Add documentation for cgroup namespaces

From: Serge Hallyn
Date: Mon Dec 28 2015 - 16:13:37 EST


On Mon Dec 28 2015 09:47:35 AM PST, Tejun Heo <tj@xxxxxxxxxx> wrote:

> Hello,
>
> I did some heavy editing of the documentation.ÂÂÂÂ How does this look?

Thanks Tejun, just three things (which come from my version):

> Did I miss anything?
>
> Thanks.
> ---
>ÂÂÂÂ Documentation/cgroup.txt |ÂÂÂÂ 146
> +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 146
> insertions(+)
>
> --- a/Documentation/cgroup.txt
> +++ b/Documentation/cgroup.txt
> @@ -47,6 +47,11 @@ CONTENTS
>Â Â ÂÂÂÂ 5-3. IO
>Â Â Â Â ÂÂÂÂ 5-3-1. IO Interface Files
>Â Â Â Â ÂÂÂÂ 5-3-2. Writeback
> +6. Namespace
> +ÂÂÂÂ 6-1. Basics
> +ÂÂÂÂ 6-2. The Root and Views
> +ÂÂÂÂ 6-3. Migration and setns(2)
> +ÂÂÂÂ 6-4. Interaction with Other Namespaces
>ÂÂÂÂ P. Information on Kernel Programming
>Â Â ÂÂÂÂ P-1. Filesystem Support for Writeback
>ÂÂÂÂ D. Deprecated v1 Core Features
> @@ -1013,6 +1018,147 @@ writeback as follows.
>ÂÂÂÂ ÂÂÂ vm.dirty[_background]_ratio.
>ÂÂÂÂ
>ÂÂÂÂ
> +6. Namespace
> +
> +6-1. Basics
> +
> +cgroup namespace provides a mechanism to virtualize the view of the
> +"/proc/$PID/cgroup" file

and cgroup mounts

>.ÂÂÂÂ The CLONE_NEWCGROUP clone flag can be used
> +with clone(2) and unshare(2) to create a new cgroup namespace.ÂÂÂÂ The
> +process running inside the cgroup namespace will have its
> +"/proc/$PID/cgroup" output restricted to cgroupns root.ÂÂÂÂ The cgroupns
> +root is the cgroup of the process at the time of creation of the
> +cgroup namespace.
> +
> +Without cgroup namespace, the "/proc/$PID/cgroup" file shows the
> +complete path of the cgroup of a process.ÂÂÂÂ In a container setup where
> +a set of cgroups and namespaces are intended to isolate processes the
> +"/proc/$PID/cgroup" file may leak potential system level information
> +to the isolated processes.ÂÂÂÂ For Example:
> +
> +ÂÂÂÂ # cat /proc/self/cgroup
> +ÂÂÂÂ 0::/batchjobs/container_id1
> +
> +The path '/batchjobs/container_id1' can be considered as system-data
> +and undesirable to expose to the isolated processes.ÂÂÂÂ cgroup namespace
> +can be used to restrict visibility of this path.ÂÂÂÂ For example, before
> +creating a cgroup namespace, one would see:
> +
> +ÂÂÂÂ # ls -l /proc/self/ns/cgroup
> +ÂÂÂÂ lrwxrwxrwx 1 root root 0 2014-07-15 10:37 /proc/self/ns/cgroup ->
> cgroup:[4026531835] +ÂÂÂÂ # cat /proc/self/cgroup
> +ÂÂÂÂ 0::/batchjobs/container_id1
> +
> +After unsharing a new namespace, the view changes.
> +
> +ÂÂÂÂ # ls -l /proc/self/ns/cgroup
> +ÂÂÂÂ lrwxrwxrwx 1 root root 0 2014-07-15 10:35 /proc/self/ns/cgroup ->
> cgroup:[4026532183] +ÂÂÂÂ # cat /proc/self/cgroup
> +ÂÂÂÂ 0::/
> +
> +When some thread from a multi-threaded process unshares its cgroup
> +namespace, the new cgroupns gets applied to the entire process (all
> +the threads).ÂÂÂÂ This is natural for the v2 hierarchy; however, for the
> +legacy hierarchies, this may be unexpected.
> +
> +A cgroup namespace is alive as long as there are processes inside it.

Or mounts pinning it.

> +When the last process exits

or the last mount is umounted,

>, the cgroup namespace is destroyed.ÂÂÂÂ The
> +cgroupns root and the actual cgroups remain.
> +
> +
> +6-2. The Root and Views
> +
> +The 'cgroupns root' for a cgroup namespace is the cgroup in which the
> +process calling unshare(2) is running.ÂÂÂÂ For example, if a process in
> +/batchjobs/container_id1 cgroup calls unshare, cgroup
> +/batchjobs/container_id1 becomes the cgroupns root.ÂÂÂÂ For the
> +init_cgroup_ns, this is the real root ('/') cgroup.
> +
> +The cgroupns root cgroup does not change even if the namespace creator
> +process later moves to a different cgroup.
> +
> +ÂÂÂÂ # ~/unshare -c # unshare cgroupns in some cgroup
> +ÂÂÂÂ # cat /proc/self/cgroup
> +ÂÂÂÂ 0::/
> +ÂÂÂÂ # mkdir sub_cgrp_1
> +ÂÂÂÂ # echo 0 > sub_cgrp_1/cgroup.procs
> +ÂÂÂÂ # cat /proc/self/cgroup
> +ÂÂÂÂ 0::/sub_cgrp_1
> +
> +Each process gets its namespace-specific view of "/proc/$PID/cgroup"
> +
> +Processes running inside the cgroup namespace will be able to see
> +cgroup paths (in /proc/self/cgroup) only inside their root cgroup.
> +From within an unshared cgroupns:
> +
> +ÂÂÂÂ # sleep 100000 &
> +ÂÂÂÂ [1] 7353
> +ÂÂÂÂ # echo 7353 > sub_cgrp_1/cgroup.procs
> +ÂÂÂÂ # cat /proc/7353/cgroup
> +ÂÂÂÂ 0::/sub_cgrp_1
> +
> +From the initial cgroup namespace, the real cgroup path will be
> +visible:
> +
> +ÂÂÂÂ $ cat /proc/7353/cgroup
> +ÂÂÂÂ 0::/batchjobs/container_id1/sub_cgrp_1
> +
> +From a sibling cgroup namespace (that is, a namespace rooted at a
> +different cgroup), the cgroup path relative to its own cgroup
> +namespace root will be shown.ÂÂÂÂ For instance, if PID 7353's cgroup
> +namespace root is at '/batchjobs/container_id2', then it will see
> +
> +ÂÂÂÂ # cat /proc/7353/cgroup
> +ÂÂÂÂ 0::/../container_id2/sub_cgrp_1
> +
> +Note that the relative path always starts with '/' to indicate that
> +its relative to the cgroup namespace root of the caller.
> +
> +
> +6-3. Migration and setns(2)
> +
> +Processes inside a cgroup namespace can move into and out of the
> +namespace root if they have proper access to external cgroups

this really means two things - write DAC access to the cgroupfs files, and access to the directories through a cgroupfs mount.ÂÂ Not sure if that should be spelled out.

>.ÂÂÂÂ For
> +example, from inside a namespace with cgroupns root at
> +/batchjobs/container_id1, and assuming that the global hierarchy is
> +still accessible inside cgroupns:
> +
> +ÂÂÂÂ # cat /proc/7353/cgroup
> +ÂÂÂÂ 0::/sub_cgrp_1
> +ÂÂÂÂ # echo 7353 > batchjobs/container_id2/cgroup.procs
> +ÂÂÂÂ # cat /proc/7353/cgroup
> +ÂÂÂÂ 0::/../container_id2
> +
> +Note that this kind of setup is not encouraged.ÂÂÂÂ A task inside cgroup
> +namespace should only be exposed to its own cgroupns hierarchy.
> +
> +setns(2) to another cgroup namespace is allowed when:
> +
> +(a) the process has CAP_SYS_ADMIN against its current user namespace
> +(b) the process has CAP_SYS_ADMIN against the target cgroup
> +Â Â ÂÂÂÂ namespace's userns
> +
> +No implicit cgroup changes happen with attaching to another cgroup
> +namespace.ÂÂÂÂ It is expected that the someone moves the attaching
> +process under the target cgroup namespace root.
> +
> +
> +6-4. Interaction with Other Namespaces
> +
> +Namespace specific cgroup hierarchy can be mounted by a process
> +running inside a non-init cgroup namespace.
> +
> +ÂÂÂÂ # mount -t cgroup2 none $MOUNT_POINT
> +
> +This will mount the unified cgroup hierarchy with cgroupns root as the
> +filesystem root.ÂÂÂÂ The process needs CAP_SYS_ADMIN against its user and
> +mount namespaces.
> +
> +The virtualization of /proc/self/cgroup file combined with restricting
> +the view of cgroup hierarchy by namespace-private cgroupfs mount
> +provides a properly isolated cgroup view inside the container.
> +
> +
>ÂÂÂÂ P. Information on Kernel Programming
>ÂÂÂÂ
>ÂÂÂÂ This section contains kernel programming information in the areas
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel"
> in the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info atÂÂÂÂ http://vger.kernel.org/majordomo-info.html
> Please read the FAQ atÂÂÂÂ http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/