Re: RFC for a new Scheduling policy/class in the Linux-kernel

From: Raj Rajkumar
Date: Thu Jul 16 2009 - 13:54:36 EST

Next message: Ira W. Snyder: "Re: [PATCH 2/4] edac: mpc85xx add mpc83xx support"
Previous message: Rajiv Andrade: "[PATCH] TPM: DATA_EXPECT bit check bypass"
In reply to: Ted Baker: "Re: RFC for a new Scheduling policy/class in the Linux-kernel"
Next in thread: Raistlin: "Re: RFC for a new Scheduling policy/class in the Linux-kernel"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Dear Jim, Ted and others:

Let me first introduce myself. I work on real-time systems at Carnegie Mellon Univ and am one of the authors on the papers that introduced the priority inheritance and priority ceiling protocols, and some of their follow-ons. I was also a co-founder of TimeSys in 1996. My PhD student, Karthik Lakshmanan, and an SEI Sr. Staff Member, Dr. Dionisio de Niz, and I have begun to revisit multiprocessor scheduling and synchronization thanks in part to the surging interest in multicore processors.

I just came to know about this message thread. I hope, as Jim did, that I can add something to the ongoing discussion - if not, please bear with me.

First of all, let me agree completely with Jim that simplicity goes a very long way. Two relevant examples follow. First, fixed-priority scheduling has dominated most real-time systems over dynamic-priority scheduling) since (a) the practical performance differences are not significant (b) overheads are lower and (c) perhaps above all, less information is required. Secondly, priority ceiling/ceiling emulation protocols have much better properties than basic priority inheritance protocols but the latter are more widely used. The reason appears to be that PCP requires additional information which can also change at run-time. Overall, simplicity does win.

Secondly, let me fully agree with Ted in that Linux-RT extensions should support bandwidth control & isolation. My group's work on "reserves" and "resource kernels" looked at isolating both temporal and spatial resources (please see http://www.ece.cmu.edu/~raj/publications.html#ResourceKernels) in the context of Linux. Karthik's work extended Linux/RK to distributed systems (Distributed Linux/RK).

Thirdly, I also agree with Jim that non-preemptive locking and FIFO queueing are fine for mutexes in the kernel. The primary reason is that critical sections in the kernel are written by experts and tend to be short. And as it turns out, this is exactly how Linux SMP support has been for quite a few years.

It looks to me like Jim and Bjoern name the kernel-mutex locking scheme (of non-preemption and FIFO queueing) as FMLP and advocate it for user-level mutexes. Jim: Please correct me if my interpretation is incorrect.

I would like to propose a solution with 2 components. First, priority inheritance protocols not just prevent (potentially) unbounded priority inversion but offer less blocking for higher-priority tasks with typically tighter timing constraints. It is also well known that non-preemptive execution is an extreme/simplified form of the priority ceiling protocol, where every task is assumed to access every shared resource/mutex and hence every priority ceiling is the highest priority in the system. There are cases when this is fine (e.g. when all critical sections are *relatively* small as in the Linux kernel) and there are cases when this is not (e.g. when some user-level critical sections accessed only by lower-criticality tasks are *relatively* long compared to the timing constraints of higher priority tasks.

Component 1: Let a priority ceiling be associated with each user-level mutex. When the mutex is locked, the corresponding critical section is executed at the priority ceiling value. The developer can choose this to be the highest priority in the system in which case it becomes a non-preemptive critical section. In addition, we could allow mutexes to either pick basic priority inheritance (desirable for local mutexes?) or the priority ceiling version (desirable for global mutexes shared across processors/cores).

Component 2: The queueing discipline for tasks pending on locked mutexes is the second policy under discussion. Jim argues that it should be FIFO. Imagine a bunch of lower-priority tasks stuck in front of a higher-priority task, and the "priority inversion" time can be considerable. (Some including me would argue that it goes against the grain of using priority-based scheduling for real-time systems in the first place. Why not just use FIFO scheduling?). However, a practical compromise is easy to reach as well. Let the queueing policy (FIFO or priority-based) on mutexes be an option. FIFO for a "local" mutex would not be very helpful. And priority-queueing for a "global" mutex would help tasks with tight timing constraints.

Proposal Summary

1. Associate a priority ceiling (priority at which critical sections are executed) with each (global) mutex. A macro HIGHEST_CEILING could represent a value that is higher than any other application-level priority.
2. Allow the developer to specify the queueing discipline on a (global) mutex. MUTEX_FIFO_QUEUEING and MUTEX_PRIORITY_QUEUEING are allowed options.

I would appreciate any comments. Thanks for reading through a lot (if you reached this far :-)

---
Raj
http://www.ece.cmu.edu/~raj

P.S. Some relevant references from a couple of decades back.

[1] Rajkumar, R., "Real-Time Synchronization Protocols for Shared Memory Multiprocessors". The Tenth International Conference on Distributed Computing Systems, 1990.
[2] Rajkumar, R., Sha, L., and Lehoczky J.P. "Real-Time Synchronization Protocols for Multiprocessors". Proceedings of the
IEEE Real-Time Systems Symposium, December 1988, pp. 259-269.

Global locking is costly - global critical sections could be moved to a "global synchronization processor" (and is described in one of the articles above).

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Ira W. Snyder: "Re: [PATCH 2/4] edac: mpc85xx add mpc83xx support"
Previous message: Rajiv Andrade: "[PATCH] TPM: DATA_EXPECT bit check bypass"
In reply to: Ted Baker: "Re: RFC for a new Scheduling policy/class in the Linux-kernel"
Next in thread: Raistlin: "Re: RFC for a new Scheduling policy/class in the Linux-kernel"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]