[PATCH v11 10/10] cpuset: Restrict sched.partition to first level children of root only

From: Waiman Long
Date: Tue Jul 10 2018 - 10:23:16 EST


Enabling partition on a v2 cpuset has the side effect of affecting the
effective CPUs of its parent which is currently unique to partitioning.
As we are not sure about the repercussion of enabling that globally,
we are now restricting the enabling of sched.partition to the first
level children of the root cgroup in the default hierarchy.

This is done by removing the "cpuset.sched.partition" control file
on cgroups that are not the first level children of the root. A new
show_cfile function pointer is added to the cftype structure. If it
is defined, it will be called to return a boolean value to determine
if the corresponding control file should show up in the cgroup. It
provides a more flexible mechanism to determine the visibility of the
control file than a simple CFTYPE_* flag can do.

Signed-off-by: Waiman Long <longman@xxxxxxxxxx>
---
Documentation/admin-guide/cgroup-v2.rst | 10 +++++-----
include/linux/cgroup-defs.h | 9 +++++++++
kernel/cgroup/cgroup.c | 4 ++++
kernel/cgroup/cpuset.c | 10 ++++++++++
4 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst
b/Documentation/admin-guide/cgroup-v2.rst
index f7cde15..cf7cd88 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1585,10 +1585,10 @@ Cpuset Interface Files
Its value will be affected by memory nodes hotplug events.

cpuset.sched.partition
- A read-write single value file which exists on non-root
- cpuset-enabled cgroups. It is a binary value flag that accepts
- either "0" (off) or "1" (on). This flag is set and owned by the
- parent cgroup.
+ A read-write single value file which exists on the first level
+ children of the root cgroup. It is a binary value flag that
+ accepts either "0" (off) or "1" (on). This flag is set and
+ owned by the parent cgroup.

If set, it indicates that the current cgroup is the root of a
new partition or scheduling domain that comprises itself and
@@ -1603,7 +1603,7 @@ Cpuset Interface Files
exclusive, i.e. they are not shared by any of its siblings.
2) The "cpuset.cpus" is also a proper subset of the parent's
"cpuset.cpus.effective".
- 3) The parent cgroup is a partition root.
+ 3) The parent cgroup is the root cgroup.
4) There is no child cgroups with cpuset enabled. This is for
eliminating corner cases that have to be handled if such a
condition is allowed.
diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index c0e68f9..be79487 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -565,6 +565,15 @@ struct cftype {
ssize_t (*write)(struct kernfs_open_file *of,
char *buf, size_t nbytes, loff_t off);

+ /*
+ * show_cfile(), if defined, will return a boolean value to
+ * determine if the control file should show up in the cgroup.
+ * It provides more flexibility in deciding where the control
+ * file should appear than simple criteria like on-root or
+ * not-on-root.
+ */
+ bool (*show_cfile)(struct cgroup_subsys_state *css);
+
#ifdef CONFIG_DEBUG_LOCK_ALLOC
struct lock_class_key lockdep_key;
#endif
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 077370b..0afdab8 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -3612,6 +3612,10 @@ static int cgroup_addrm_files(struct
cgroup_subsys_state *css,
if ((cft->flags & CFTYPE_ONLY_ON_ROOT) && cgroup_parent(cgrp))
continue;

+ /* Should the control file show up in the cgroup */
+ if (cft->show_cfile && !cft->show_cfile(css))
+ continue;
+
if (is_add) {
ret = cgroup_add_file(css, cgrp, cft);
if (ret) {
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 62b7e61..2f85c1e 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -2106,6 +2106,15 @@ static s64 cpuset_read_s64(struct
cgroup_subsys_state *css, struct cftype *cft)
}

/*
+ * The sched.partition control file should only show up in the first
+ * level children of the root cgroup.
+ */
+static bool cpuset_show_partition(struct cgroup_subsys_state *css)
+{
+ return parent_cs(css_cs(css)) == &top_cpuset;
+}
+
+/*
* for the common functions, 'private' gives the type of file
*/

@@ -2250,6 +2259,7 @@ static s64 cpuset_read_s64(struct
cgroup_subsys_state *css, struct cftype *cft)
.name = "sched.partition",
.read_u64 = cpuset_read_u64,
.write_u64 = cpuset_write_u64,
+ .show_cfile = cpuset_show_partition,
.private = FILE_PARTITION_ROOT,
.flags = CFTYPE_NOT_ON_ROOT,
},
--
1.8.3.1