[RFC 11/12][PATCH] SCHED_DEADLINE: documentation

From: Raistlin
Date: Fri Oct 16 2009 - 11:48:36 EST


This commit adds some more documentation and comments on how the new
scheduling policy works.

Signed-off-by: Raistlin <raistlin@xxxxxxxx>
---
Documentation/scheduler/sched-deadline.txt | 174 ++++++++++++++++++++++++++++
include/linux/sched.h | 45 +++++++
init/Kconfig | 1 +
3 files changed, 220 insertions(+), 0 deletions(-)
create mode 100644 Documentation/scheduler/sched-deadline.txt

diff --git a/Documentation/scheduler/sched-deadline.txt b/Documentation/scheduler/sched-deadline.txt
new file mode 100644
index 0000000..cadfa9f
--- /dev/null
+++ b/Documentation/scheduler/sched-deadline.txt
@@ -0,0 +1,174 @@
+ Deadline Task and Group Scheduling
+ ----------------------------------
+
+CONTENTS
+========
+
+0. WARNING
+1. Overview
+ 1.1 Task scheduling
+ 1.2 Group scheduling
+2. The interface
+ 2.1 System-wide settings
+ 2.2 Default behavior
+ 2.3 Basis for grouping tasks
+3. Future plans
+
+
+0. WARNING
+==========
+
+ Fiddling with these settings can result in an unpredictable or even unstable
+ system behavior. As for -rt (group) scheduling, it is assumed that root
+ knows what he is doing.
+
+
+1. Overview
+===========
+
+The SCHED_DEADLINE scheduling class implements the Earliest Deadline First
+(EDF) algorithm and uses the Constant Bandwidth Server (CBS) to provide
+bandwidth isolation among tasks.
+The implementation is aligned with the current mainstream kernel, and it
+relies on standard Linux mechanisms (e.g., control groups) to natively support
+multicore platforms and to provide hierarchical scheduling through a standard
+API.
+
+
+1.1 Task scheduling
+-------------------
+
+The SCHED_DEADLINE scheduling class does not make any restrictive assumption
+on the characteristics of the tasks, thus it can handle:
+ * periodic tasks, typical in real-time and control applications;
+ * sporadic tasks, typical in soft real-time and multimedia applications;
+ * aperiodic tasks.
+
+This is mainly because temporal isolation is ensured: the temporal behavior
+of each task (i.e., its ability to meet deadlines) is not affected by what
+happens in any other task in the system.
+In other words, even if a task misbehaves, it is not able to exploit larger
+execution time than the amount that has been devoted to it.
+
+In fact, each task is assigned a ``scheduling budget'' (sched_runtime) and a
+``scheduling deadline'' (sched_deadline, also called period in this branch
+of the real-time literature).
+This means the task is guaranteed to execute for an amount of time equal to
+sched_runtime every sched_deadline, i.e., to utilize at most a CPU bandwidth
+equal to sched_runtime/sched_deadline.
+If it tries to execute more than its sched_runtime it is slowed down, by
+stopping it until the time instant of its next deadline.
+
+However, although this algorithm (i.e., the CBS) is effective for encapsulating
+aperiodic or sporadic --real-time or non real-time-- tasks in a real-time
+EDF scheduled system, it imposes some overhead to ``standard'' periodic tasks.
+Therefore, we make it possible for periodic task to specify that they are going
+to sleep, waiting for the next activation, because a periodic instance just
+ended. This avoid them (provided they behave well!) being disturbed by
+the CBS bandwidth management logic.
+
+
+Group scheduling
+----------------
+
+The scheduling class is integrated with the control groups mechanism in order
+to allow the creation of groups of tasks with a cap on their total utilization.
+
+However, groups plays no role in the on-line scheduling decisions. This is
+different on how group scheduling works for the -rt scheduling class, and
+the difference comes from the fact that -deadline tasks _already_ have their
+own bandwidth, which is not true for standard POSIX SCHED_FIFO or SCHED_RR
+processes and threads.
+
+Therefore, there is no need for fully hierarchical runqueue implementation,
+hierarchical runtime accounting, etc., which result in simpler code and
+smaller overhead.
+All we do are bandwidth ``consistency checks'', which are performed at the
+occurrence of the following events:
+ * a -deadline task is created or moved inside a group,
+ * the parameters of a -deadline task (if inside a group) are modified,
+ * the -deadline related parameters of a group are modified.
+
+The purpose of this is ensuring the cumulative utilization of tasks and
+groups is below the one of the group containing them (see below).
+
+
+2. The Interface
+================
+
+
+2.1 System wide settings
+------------------------
+
+The system wide settings are configured under the /proc virtual file system:
+
+/proc/sys/kernel/sched_deadline_period_us:
+ The scheduling period that is equivalent to 100% CPU bandwidth
+
+/proc/sys/kernel/sched_deadline_runtime_us:
+ A global limit on how much time real-time scheduling may use. Even without
+ CONFIG_DEADLINE_GROUP_SCHED enabled, this will limit time reserved to
+ -deadline processes. With CONFIG_DEADLINE_GROUP_SCHED it signifies the
+ total bandwidth available to all real-time groups.
+
+ * Time is specified in us because the interface is s32. This gives an
+ operating range from 1us to about 35 minutes;
+ * sched_deadline_period_us takes values from 1 to INT_MAX;
+ * sched_deadline_runtime_us takes values from 1 to INT_MAX;
+ * setting runtime = period specifies 100% bandwidth exploitable by
+ -deadline tasks;
+ * setting runtime > period allows for more than 100% bandwidth
+ exploitable by -deadline tasks, which still might make sense,
+ especially in SMP systems.
+
+
+2.2 Default behavior
+---------------------
+
+The default values for sched_deadline_period_us and
+sched_deadline_runtime_us are 0. This means no -deadline tasks or
+groups can be created!
+
+Consistently, bandwidth assigned to the root group, and to each newly created
+group, is 0 as well.
+
+
+2.3 Basis for grouping tasks
+----------------------------
+
+There are two compile-time settings for allocating CPU bandwidth. These are
+configured using the "Basis for grouping tasks" multiple choice menu under
+General setup > Group CPU Scheduler:
+
+CONFIG_USER_SCHED (aka "Basis for grouping tasks" = "user id")
+
+This, for now, is not supported for deadline group scheduling.
+
+CONFIG_CGROUP_SCHED (aka "Basis for grouping tasks" = "Control groups")
+
+This uses the /cgroup virtual file system, i.e.:
+ * /cgroup/<cgroup>/cpu.deadline_runtime_us and
+ * /cgroup/<cgroup>/cpu.deadline_period_us,
+to control the CPU time reserved or each control group.
+
+For more information on working with control groups, you should read
+Documentation/cgroups/cgroups.txt as well.
+
+Group settings are checked against the following limits:
+
+ * for the root group {r}
+ runtime_{r} / period_{r} <= global_runtime / global_period
+ * for each group {i}, subgroup of group {j}
+ \Sum_{i} runtime_{i} / period_{i} <= runtime_{j} / period_{j}
+
+
+3. Future plans
+===============
+
+Only two, but very important pieces are missing:
+
+ * SMP/multicore global scheduling throughout push and pull logic (as in
+ -rt). This is not finished, but is on it's way, and will come very soon!
+ * Deadline/BandWidth Inheritance and/or Proxy Execution mechanisms for the
+ rt_mutexes. This probably need some more discussion, and also some more time
+ to have it implemented!
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 4de72eb..ec0324f 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -95,6 +95,51 @@ struct sched_param {

#include <asm/processor.h>

+/*
+ * Extended sched_param for SCHED_DEADLINE tasks.
+ *
+ * In fact, struct sched_param can not be modified for binary compatibility
+ * issues.
+ *
+ * A SCHED_DEADLINE task have at least a scheduling deadline (sched_deadline)
+ * and a scheduling runtime (sched_runtime). Space for a scheduling
+ * period (sched_period) is reserved, but the field is not used right now.
+ *
+ * When a SCHED_DEADLINE task activates at time t, its absolute deadline is
+ * computed as:
+ * deadline = t + sched_deadline.
+ * The SCHED_DEADLINE runqueue is ordered according to ascending tasks'
+ * deadline values, thus the task with the _earliest_ deadline is the one
+ * that will be scheduled.
+ *
+ * In order of avoiding one task to cause intefrerence on the others, each
+ * task activation is allowed to run for at its runtime, which is at most
+ * sched_runtime.
+ * After that, the task is stopped until its deadline, when it is reactivated
+ * with a new 'runtime quota' and a new deadline.
+ *
+ * Period (or minimum interarrival time) is not dealt with in the kernel, and
+ * it is up to the user to make the task suspend at the end of each instance.
+ * The sched_wait_interval() --with clock_nanosleep like semantic-- syscall
+ * can be used for this purpose. In this case, when the task resumes, the
+ * scheduler assumes a new instance is just starting, and provide the task
+ * with new runtime and deadline values.
+ *
+ * Scheduling flags, finally, let the user specify if runtime overruns (which
+ * may occur, e.g., for timing resolution issues) and/or deadline misses
+ * (e.g., because system is oversubscribed) have to be notified by means of
+ * SIGXCPU signals.
+ *
+ * @sched_priority: not used right now
+ *
+ * @sched_deadline: scheduling deadline of the task
+ * @sched_runtime: scheduling runtime of the task
+ * @sched_period: not used right now
+ *
+ * @sched_flags: scheduling flags of the task (runtime overrun and/or
+ * deadline miss only, for now)
+ */
+
#define SCHED_SIG_RORUN 0x80000000
#define SCHED_SIG_DMISS 0x40000000

diff --git a/init/Kconfig b/init/Kconfig
index 17318ca..d4a52b7 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -467,6 +467,7 @@ config DEADLINE_GROUP_SCHED
tasks (and other groups) can be added to it only up to such
``bandwidth cap'', which might be useful for avoiding or
controlling oversubscription.
+ See Documentation/scheduler/sched-deadline.txt for more.

choice
depends on GROUP_SCHED
--
1.6.0.4


--
<<This happens because I choose it to happen!>> (Raistlin Majere)
----------------------------------------------------------------------
Dario Faggioli, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa (Italy)

http://blog.linux.it/raistlin / raistlin@xxxxxxxxx /
dario.faggioli@xxxxxxxxxx

Attachment: signature.asc
Description: This is a digitally signed message part