[patch V3 2/8] add prctl task isolation prctl docs and samples

From: Marcelo Tosatti
Date: Tue Aug 24 2021 - 11:43:06 EST


Add documentation and userspace sample code for prctl
task isolation interface.

Signed-off-by: Marcelo Tosatti <mtosatti@xxxxxxxxxx>

---
Documentation/userspace-api/task_isolation.rst | 211 +++++++++++++++++++++++++
samples/Kconfig | 7
samples/Makefile | 1
samples/task_isolation/Makefile | 9 +
samples/task_isolation/task_isol.c | 83 +++++++++
samples/task_isolation/task_isol.h | 9 +
samples/task_isolation/task_isol_userloop.c | 56 ++++++
7 files changed, 376 insertions(+)

Index: linux-2.6/Documentation/userspace-api/task_isolation.rst
===================================================================
--- /dev/null
+++ linux-2.6/Documentation/userspace-api/task_isolation.rst
@@ -0,0 +1,281 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============================
+Task isolation prctl interface
+===============================
+
+Certain types of applications benefit from running uninterrupted by
+background OS activities. Realtime systems and high-bandwidth networking
+applications with user-space drivers can fall into the category.
+
+To create an OS noise free environment for the application, this
+interface allows userspace to inform the kernel the start and
+end of the latency sensitive application section (with configurable
+system behaviour for that section).
+
+Note: the prctl interface is independent of nohz_full=.
+
+The prctl options are:
+
+
+ - PR_ISOL_FEAT: Retrieve supported features.
+ - PR_ISOL_GET: Retrieve task isolation parameters.
+ - PR_ISOL_SET: Set task isolation parameters.
+ - PR_ISOL_CTRL_GET: Retrieve task isolation state.
+ - PR_ISOL_CTRL_SET: Set task isolation state.
+ - PR_ISOL_GET_INT: Retrieve internal parameters.
+ - PR_ISOL_SET_INT: Retrieve internal parameters.
+
+
+
+Inheritance of the isolation parameters and state, across
+fork(2) and clone(2), can be changed via
+PR_ISOL_GET_INT/PR_ISOL_SET_INT.
+
+At a high-level, task isolation is divided in two steps:
+
+1. Configuration.
+2. Activation.
+
+Section "Userspace support" describes how to use
+task isolation.
+
+In terms of the interface, the sequence of steps to activate
+task isolation are:
+
+1. Retrieve supported task isolation features (PR_ISOL_FEAT).
+2. Configure task isolation features (PR_ISOL_SET/PR_ISOL_GET).
+3. Activate or deactivate task isolation features
+ (PR_ISOL_CTRL_GET/PR_ISOL_CTRL_SET).
+4. Optionally configure inheritance (PR_ISOL_GET_INT/PR_ISOL_SET_INT).
+
+This interface is based on ideas and code from the
+task isolation patchset from Alex Belits:
+https://lwn.net/Articles/816298/
+
+--------------------
+Feature description
+--------------------
+
+ - ``ISOL_F_QUIESCE``
+
+ This feature allows quiescing select kernel activities on
+ return from system calls.
+
+---------------------
+Interface description
+---------------------
+
+**PR_ISOL_FEAT**:
+
+ Returns the supported features and feature
+ capabilities, as a bitmask::
+
+ prctl(PR_ISOL_FEAT, feat, arg3, arg4, arg5);
+
+ The 'feat' argument specifies whether to return
+ supported features (if zero), or feature capabilities
+ (if not zero). Possible non-zero values for 'feat' are:
+
+ - ``ISOL_F_QUIESCE``:
+
+ Returns a bitmask containing which kernel
+ activities are supported for quiescing.
+
+ Features and its capabilities are defined at include/uapi/linux/task_isolation.h.
+
+**PR_ISOL_GET**:
+
+ Retrieve task isolation feature configuration.
+ The general format is::
+
+ prctl(PR_ISOL_GET, feat, arg3, arg4, arg5);
+
+ The 'feat' argument specifies whether to return
+ configured features (if zero), or individual feature
+ configuration (if not zero). Possible non-zero
+ values for 'feat' are:
+
+ - ``ISOL_F_QUIESCE``:
+
+ Returns a bitmask containing which kernel
+ activities are enabled for quiescing.
+
+
+**PR_ISOL_SET**:
+
+ Configures task isolation features. The general format is::
+
+ prctl(PR_ISOL_SET, feat, arg3, arg4, arg5);
+
+ The 'feat' argument specifies which feature to configure.
+ Possible values for feat are:
+
+ - ``ISOL_F_QUIESCE``:
+
+ The 'arg3' argument is a bitmask specifying which
+ kernel activities to quiesce. Possible bit sets are:
+
+ - ``ISOL_F_QUIESCE_VMSTATS``
+
+ VM statistics are maintained in per-CPU counters to
+ improve performance. When a CPU modifies a VM statistic,
+ this modification is kept in the per-CPU counter.
+ Certain activities require a global count, which
+ involves requesting each CPU to flush its local counters
+ to the global VM counters.
+
+ This flush is implemented via a workqueue item, which
+ might schedule a workqueue on isolated CPUs.
+
+ To avoid this interruption, task isolation can be
+ configured to, upon return from system calls, synchronize
+ the per-CPU counters to global counters, thus avoiding
+ the interruption.
+
+ To ensure the application returns to userspace
+ with no modified per-CPU counters, its necessary to
+ use mlockall() in addition to this isolcpus flag.
+
+**PR_ISOL_CTRL_GET**:
+
+ Retrieve task isolation control.
+
+ prctl(PR_ISOL_CTRL_GET, 0, 0, 0, 0);
+
+ Returns which isolation features are active.
+
+**PR_ISOL_CTRL_SET**:
+
+ Activates/deactivates task isolation control.
+
+ prctl(PR_ISOL_CTRL_SET, mask, 0, 0, 0);
+
+ The 'mask' argument specifies which features
+ to activate (bit set) or deactivate (bit clear).
+
+ For ISOL_F_QUIESCE, quiescing of background activities
+ happens on return to userspace from the
+ prctl(PR_ISOL_CTRL_SET) call, and on return from
+ subsequent system calls.
+
+ Quiescing can be adjusted (while active) by
+ prctl(PR_ISOL_SET, ISOL_F_QUIESCE, ...).
+
+**PR_ISOL_GET_INT**:
+
+ Retrieves task isolation internal parameters.
+
+ The general format is::
+
+ prctl(PR_ISOL_GET_INT, cmd, arg3, arg4, arg5);
+
+ The 'cmd' argument specifies which parameter to configure.
+ Possible values for cmd are:
+
+ - ``INHERIT_CFG``:
+
+ Retrieve inheritance configuration.
+
+ The 'arg3' argument is a pointer to a struct
+ inherit_control::
+
+ struct task_isol_inherit_control {
+ __u8 inherit_mask;
+ __u8 pad[7];
+ };
+
+ See PR_ISOL_SET_INT description below for meaning
+ of structure fields.
+
+**PR_ISOL_SET_INT**:
+
+ Sets task isolation internal parameters.
+
+ The general format is::
+
+ prctl(PR_ISOL_SET_INT, cmd, arg3, arg4, arg5);
+
+ The 'cmd' argument specifies which parameter to configure.
+ Possible values for cmd are:
+
+ - ``INHERIT_CFG``:
+
+ Set inheritance configuration when a new task
+ is created via fork and clone.
+
+ The 'arg3' argument is a pointer to a struct
+ inherit_control::
+
+ struct task_isol_inherit_control {
+ __u8 inherit_mask;
+ __u8 pad[7];
+ };
+
+ inherit_mask is a bitmask that specifies which part
+ of task isolation to be inherited:
+
+ - Bit ISOL_INHERIT_CONF: Inherit task isolation configuration.
+ This is the stated written via prctl(PR_ISOL_SET, ...).
+
+ - Bit ISOL_INHERIT_ACTIVE: Inherit task isolation activation
+ (requires ISOL_INHERIT_CONF to be set). The new task
+ should behave, right after fork/clone, in the same manner
+ as the parent task _after_ it executed:
+
+ prctl(PR_ISOL_CTRL_SET, mask, 0, 0, 0);
+
+ with a valid mask.
+
+==================
+Userspace support
+==================
+
+Task isolation is divided in two main steps: configuration and activation.
+
+Each step can be performed by an external tool or the latency sensitive
+application itself. util-linux contains the "chisol" tool for this
+purpose.
+
+This results in three combinations:
+
+1. Both configuration and activation performed by the
+latency sensitive application.
+Allows fine grained control of what task isolation
+features are enabled and when (see samples section below).
+
+2. Only activation can be performed by the latency sensitive app
+(and configuration performed by chisol).
+This allows the admin/user to control task isolation parameters,
+and applications have to be modified only once.
+
+3. Configuration and activation performed by an external tool.
+This allows unmodified applications to take advantage of
+task isolation. Activation is performed by the "-a" option
+of chisol.
+
+========
+Examples
+========
+
+The ``samples/task_isolation/`` directory contains sample
+code.
+
+This is a snippet of code to activate task isolation if
+it has been previously configured (by chisol for example)::
+
+ #include <sys/prctl.h>
+
+ #ifdef PR_ISOL_GET
+ ret = prctl(PR_ISOL_GET, 0, 0, 0, 0);
+ if (ret != -1) {
+ unsigned long mask = ret;
+
+ ret = prctl(PR_ISOL_CTRL_SET, mask, 0, 0, 0);
+ if (ret == -1) {
+ perror("prctl PR_ISOL_CTRL_SET");
+ return ret;
+ }
+ }
+ #endif
+
Index: linux-2.6/samples/task_isolation/task_isol.c
===================================================================
--- /dev/null
+++ linux-2.6/samples/task_isolation/task_isol.c
@@ -0,0 +1,83 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <sys/mman.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <sys/prctl.h>
+#include <linux/prctl.h>
+#include <errno.h>
+#include "task_isol.h"
+
+#ifdef PR_ISOL_FEAT
+int task_isol_setup(void)
+{
+ int ret;
+ int errnosv;
+
+ /* Retrieve supported task isolation features */
+ ret = prctl(PR_ISOL_FEAT, 0, 0, 0, 0);
+ if (ret == -1) {
+ perror("prctl PR_ISOL_FEAT");
+ return ret;
+ }
+ printf("supported features bitmask: 0x%x\n", ret);
+
+ /* Retrieve supported ISOL_F_QUIESCE bits */
+ ret = prctl(PR_ISOL_FEAT, ISOL_F_QUIESCE, 0, 0, 0);
+ if (ret == -1) {
+ perror("prctl PR_ISOL_FEAT (ISOL_F_QUIESCE)");
+ return ret;
+ }
+ printf("supported ISOL_F_QUIESCE bits: 0x%x\n", ret);
+
+ /* Do not configure task isolation attributes if already set */
+ ret = prctl(PR_ISOL_GET, 0, 0, 0, 0);
+ errnosv = errno;
+ if (ret != -1) {
+ printf("Task isolation parameters already configured!\n");
+ return ret;
+ }
+ if (ret == -1 && errnosv != ENODATA) {
+ perror("prctl PR_ISOL_GET");
+ return ret;
+ }
+ ret = prctl(PR_ISOL_SET, ISOL_F_QUIESCE, ISOL_F_QUIESCE_VMSTATS,
+ 0, 0);
+ if (ret == -1) {
+ perror("prctl PR_ISOL_SET");
+ return ret;
+ }
+ return ISOL_F_QUIESCE;
+}
+
+int task_isol_ctrl_set(unsigned long mask)
+{
+ int ret;
+
+ ret = prctl(PR_ISOL_CTRL_SET, mask, 0, 0, 0);
+ if (ret == -1) {
+ perror("prctl PR_ISOL_CTRL_SET (ISOL_F_QUIESCE)");
+ return -1;
+ }
+
+ return 0;
+}
+
+#else
+
+int task_isol_setup(void)
+{
+ return 0;
+}
+
+int task_isol_ctrl_set(unsigned long mask)
+{
+ return 0;
+}
+#endif
+
+
Index: linux-2.6/samples/task_isolation/task_isol.h
===================================================================
--- /dev/null
+++ linux-2.6/samples/task_isolation/task_isol.h
@@ -0,0 +1,9 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __TASK_ISOL_H
+#define __TASK_ISOL_H
+
+int task_isol_setup(void);
+
+int task_isol_ctrl_set(unsigned long mask);
+
+#endif /* __TASK_ISOL_H */
Index: linux-2.6/samples/task_isolation/task_isol_userloop.c
===================================================================
--- /dev/null
+++ linux-2.6/samples/task_isolation/task_isol_userloop.c
@@ -0,0 +1,56 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <sys/mman.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <sys/prctl.h>
+#include <linux/prctl.h>
+#include "task_isol.h"
+
+int main(void)
+{
+ int ret;
+ void *buf = malloc(4096);
+ unsigned long mask;
+
+ memset(buf, 1, 4096);
+ ret = mlock(buf, 4096);
+ if (ret) {
+ perror("mlock");
+ return EXIT_FAILURE;
+ }
+
+ ret = task_isol_setup();
+ if (ret)
+ return EXIT_FAILURE;
+ mask = ret;
+
+ ret = prctl(PR_ISOL_CTRL_SET, mask, 0, 0, 0);
+ if (ret == -1) {
+ perror("prctl PR_ISOL_CTRL_SET (ISOL_F_QUIESCE)");
+ return EXIT_FAILURE;
+ }
+
+#define NR_LOOPS 999999999
+#define NR_PRINT 100000000
+ /* busy loop */
+ while (ret < NR_LOOPS) {
+ memset(buf, 0, 4096);
+ ret = ret+1;
+ if (!(ret % NR_PRINT))
+ printf("loops=%d of %d\n", ret, NR_LOOPS);
+ }
+
+ ret = prctl(PR_ISOL_CTRL_SET, 0, 0, 0, 0);
+ if (ret == -1) {
+ perror("prctl PR_ISOL_CTRL_SET (0)");
+ exit(0);
+ }
+
+ return EXIT_SUCCESS;
+}
+
Index: linux-2.6/samples/Kconfig
===================================================================
--- linux-2.6.orig/samples/Kconfig
+++ linux-2.6/samples/Kconfig
@@ -223,4 +223,11 @@ config SAMPLE_WATCH_QUEUE
Build example userspace program to use the new mount_notify(),
sb_notify() syscalls and the KEYCTL_WATCH_KEY keyctl() function.

+config SAMPLE_TASK_ISOLATION
+ bool "task isolation sample"
+ depends on CC_CAN_LINK && HEADERS_INSTALL
+ help
+ Build example userspace program to use prctl task isolation
+ interface.
+
endif # SAMPLES
Index: linux-2.6/samples/Makefile
===================================================================
--- linux-2.6.orig/samples/Makefile
+++ linux-2.6/samples/Makefile
@@ -30,3 +30,4 @@ obj-$(CONFIG_SAMPLE_INTEL_MEI) += mei/
subdir-$(CONFIG_SAMPLE_WATCHDOG) += watchdog
subdir-$(CONFIG_SAMPLE_WATCH_QUEUE) += watch_queue
obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak/
+subdir-$(CONFIG_SAMPLE_TASK_ISOLATION) += task_isolation
Index: linux-2.6/samples/task_isolation/Makefile
===================================================================
--- /dev/null
+++ linux-2.6/samples/task_isolation/Makefile
@@ -0,0 +1,9 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+userprogs-always-y += task_isol_userloop
+task_isol_userloop-objs := task_isol.o task_isol_userloop.o
+
+userccflags += -I usr/include
+
+
+#$(CC) $^ -o $@