[RFC][PATCH 11/11] sched: add sched_dl documentation.

From: Raistlin
Date: Sun Feb 28 2010 - 14:28:27 EST


Add in Documentation/scheduler/ some hints about the design
choices, the usage and the future possible developments of the
sched_dl scheduling class and of the SCHED_DEADLINE policy.

Signed-off-by: Dario Faggioli <raistlin@xxxxxxxx>
---
Documentation/scheduler/sched-deadline.txt | 188 ++++++++++++++++++++++++++++
init/Kconfig | 1 +
2 files changed, 189 insertions(+), 0 deletions(-)

diff --git a/Documentation/scheduler/sched-deadline.txt b/Documentation/scheduler/sched-deadline.txt
new file mode 100644
index 0000000..1ff0e1e
--- /dev/null
+++ b/Documentation/scheduler/sched-deadline.txt
@@ -0,0 +1,188 @@
+ Deadline Task and Group Scheduling
+ ----------------------------------
+
+CONTENTS
+========
+
+0. WARNING
+1. Overview
+ 1.1 Task scheduling
+ 1.2 Group scheduling
+2. The interface
+ 2.1 System wide settings
+ 2.2 Task interface
+ 2.3 Group interface
+ 2.4 Default behavior
+3. Future plans
+
+
+0. WARNING
+==========
+
+ Fiddling with these settings can result in an unpredictable or even unstable
+ system behavior. As for -rt (group) scheduling, it is assumed that root
+ knows what he is doing.
+
+
+1. Overview
+===========
+
+ The SCHED_DEADLINE policy contained inside the sched_dl scheduling class is
+ basically an implementation of the Earliest Deadline First (EDF) scheduling
+ algorithm, augmented with a mechanism (called Constant Bandwidth Server, CBS)
+ that make it possible to isolate the behaviour of tasks between each other.
+
+
+1.1 Task scheduling
+-------------------
+
+ The typical -deadline task will be made up of a computation phase (instance)
+ which is activated on a periodic or sporadic fashion. The expected (maximum)
+ duration of such computation is called the task's runtime; the time interval
+ by which each instance need to be completed is called the task's relative
+ deadline. The task's absolute deadline is dynamically calculated as the
+ time instant a task (better, an instance) activates plus the relative
+ deadline.
+
+ The EDF algorithms selects the task with the smallest absolute deadline as
+ the one to be executed first, while the CBS ensures each task to run for
+ at most the its runtime every (relative) deadline length time interval,
+ avoiding any interference between different tasks (bandwidth isolation).
+ Thanks to this feature, also tasks that do not strictly comply with the
+ computational model sketched above can effectively use the new policy.
+ IOW, there are no limitations on what kind of task can exploit this new
+ scheduling discipline, even if it must be said that it is particularly
+ suited for periodic or sporadic tasks that need guarantees on their
+ timing behaviour, e.g., multimedia, streaming, control applications, etc.
+
+
+1.2 Group scheduling
+----------------
+
+ In order of -deaadline scheduling to be effective and useful, it is important
+ that some method of having the allocation of the available CPU bandwidth to
+ tasks and task groups under control.
+ This is usually called "admission control" and if it is not performed at all,
+ no guarantee can be given on the actual scheduling of the -deadline tasks.
+
+ Since when RT-throttling has been introduced each task group have a bandwidth
+ associated to itself, calculated as a certain amount of runtime over a
+ period. Moreover, to make it possible to manipulate such bandwidth,
+ readable/writable controls have been added to both procfs (for system
+ wide settings) and cgroupfs (for per-group settings).
+ Therefore, the same interface is being used for controlling the bandwidth
+ distrubution to -deadline tasks and task groups, i.e., new controls but
+ with similar names, equivalent meaning and with the same usage paradigm are
+ added.
+
+ The main differences between deadline bandwidth management and RT-throttling
+ is that -deadline tasks have bandwidth on their own (while -rt ones doesn't!),
+ and thus we don't need a throttling mechanism in the groups, which can be
+ used nothing more than for admission control of tasks.
+
+ This means that what we check is the sum of the bandwidth of all the tasks
+ belonging to the group stays, on each CPU, below the bandwidth of the group
+ itself.
+
+
+2. The Interface
+================
+
+2.1 System wide settings
+------------------------
+
+The system wide settings are configured under the /proc virtual file system:
+
+ The per-group controls that are added to the cgroupfs virtual file system are:
+ * /proc/sys/kernel/sched_dl_runtime_us,
+ * /proc/sys/kernel/sched_dl_period_us,
+ * /proc/sys/kernel/sched_dl_total_bw.
+
+ The first two accepts (if written) and provides (if read) the new runtime and
+ period, respectively, for each CPU.
+ The last one accepts (if written) the index of one online CPU, and it provides
+ (if read) the total amount of bandwidth currently alloceted on that CPU.
+
+ Settings are checked against the following limit:
+
+ * for the whole system, on each CPU:
+ rt_runtime / rt_period + dl_runtime + dl_period <= 100%
+
+
+2.2 Task interface
+------------------
+
+ Specifying a periodic/sporadic task that executes for a given amount of
+ runtime at each instance, and that is scheduled according to the usrgency of
+ their own timing constraints needs, in general, a way of declaring:
+ - a (maximum/typical) instance execution time,
+ - a minimum interval between consecutive instances,
+ - a time constraint by which each instance must be completed.
+
+ Therefore:
+ * a new struct sched_param_ex, containing all the necessary fields is
+ provided;
+ * the new scheduling related syscalls that manipulate it, i.e.,
+ sched_setscheduler_ex(), sched_setparam_ex() and sched_getparam_ex()
+ are implemented.
+
+
+2.3 Group Interface
+-------------------
+
+ The per-group controls that are added to the cgroupfs virtual file system are:
+ * /cgroup/<cgroup>/cpu.dl_runtime_us,
+ * /cgroup/<cgroup>/cpu.dl_period_us,
+ * /cgroup/<cgroup>/cpu.dl_total_bw.
+
+ The first two accepts (if written) and provides (if read) the new runtime and
+ period, respectively, of the group for each CPU.
+ The last one accepts (if written) the index of one online CPU, and it provides
+ (if read) the total amount of bandwidth currently alloceted inside the group
+ on that CPU.
+
+ Group settings are checked against the following limits:
+
+ * for the root group {r}, on each CPU:
+ dl_runtime_{r} / dl_period_{r} <= dl_runtime / dl_period
+
+ * for each group {i}, subgroup of group {j}, on each CPU:
+ dl_runtime_{i} / dl_period_{i} < 100%
+ \Sum_{i} dl_runtime_{i} / dl_period_{i} <= dl_runtime_{j} / dl_period_{j}
+
+ For more information on working with control groups,
+ Documentation/cgroups/cgroups.txt should be read.
+
+
+2.4 Default behavior
+---------------------
+
+The default values for system wide and root group dl_runtime and dl_period are
+500000 over 1000000. This means -deadline tasks and task groups can use at
+most 5% bandwidth on each CPU.
+
+When a -deadline task fork a child, its dl_runtime is set to 0, which means
+someone must call sched_setscheduler_ex() on it, or it won't even start.
+
+When a new group is created, its dl_runtime is 0, which means someone must
+(try to) increase it before tasks can be added to the group.
+
+
+3. Future plans
+===============
+
+Still Missing parts:
+
+ - bandwidth reclaiming mechanisms, i.e., methods that avoid stopping the
+ tasks until their next deadline when overrunning. There are at least
+ three of them that are very simple, and patches are on their way;
+
+ - migration of tasks throughout push and pull (as in -rt) to make it
+ possible to deploy global-EDF scheduling. Patches are ready, they're
+ just being tested and adapted to this last version;
+
+ - refinements in deadline inheritance, especially regarding the possibility
+ of retaining bandwidth isolation among non-interacting tasks. This is
+ being studied from both theoretical and practical point of views, and
+ hopefully we can have some demonstrative code soon.
+
diff --git a/init/Kconfig b/init/Kconfig
index de57415..377caed 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -486,6 +486,7 @@ config DEADLINE_GROUP_SCHED
tasks (and other groups) can be added to it only up to such
"bandwidth cap", which might be useful for avoiding or
controlling oversubscription.
+ See Documentation/scheduler/sched-deadline.txt for more.

choice
depends on GROUP_SCHED
--
1.7.0

--
<<This happens because I choose it to happen!>> (Raistlin Majere)
----------------------------------------------------------------------
Dario Faggioli, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa (Italy)

http://blog.linux.it/raistlin / raistlin@xxxxxxxxx /
dario.faggioli@xxxxxxxxxx

Attachment: signature.asc
Description: This is a digitally signed message part