Re: [RFC, PATCH, -rt] Early prototype RCU priority-boost patch

From: Esben Nielsen
Date: Fri Jul 28 2006 - 06:36:05 EST

I have considered an idea to make this work with the PI: Add the ability to at a waiter not refering to a lock to the PI list. I think a few subsystems can use that if they temporarely want to boost a task in a consistend way (HR-timers is one). After a little renaming getting the boosting part seperated out of rt_mutex_waiter:

struct prio_booster {
struct plist_node booster_list_entry;

void add_prio_booster(struct task_struct *, struct prio_booster *booster);
void remove_prio_booster(struct task_struct *, struct prio_booster *booster);
void change_prio_booster(struct task_struct *, struct prio_booster *booster, int new_prio);

(these functions takes care of doing/triggering a lock chain traversal if needed) and change

struct rt_mutext_waiter {
struct prio_booster booster;

There are issues with lock orderings between task->pi_lock (which should be renamed to task->prio_lock) and rq->lock. The lock ordering probably have to be reversed, thus integrating the boosting system directly into the scheduler instead of into rtmutex-subsystem.


On Thu, 27 Jul 2006, Paul E. McKenney wrote:


This is a very crude not-for-inclusion patch that boosts priority of
RCU read-side critical sections, but only when they are preempted, and
only to the highest non-RT priority. The rcu_read_unlock() primitive
does the unboosting. There are a large number of things that this patch
does -not- do, including:

o Boost RCU read-side critical sections to the highest possible
priority. One might wish to do this in OOM situations. Or
if the grace period is extending too long. I played with this
a bit some months back, see:

to see what I was thinking. Or similarly-numbered patches,
see for the full
list. Lots of subtly broken approaches for those who are
interested in subtle breakage.

One must carefully resolve races between boosting and the
to-be-boosted task slipping out of its RCU read-side critical
section. My thought has been to grab the to-be-boosted task
by the throat, and only boost it if it is (1) still in an
RCU read-side critical section and (2) not running. If you
try boosting a thread that is already running, the races between
boosting and rcu_read_unlock() are insanely complex, particularly
for implementations of rcu_read_unlock() that don't use atomic
instructions or memory barriers. ;-)

Much better to either have the thread boost itself or to make
sure the thread is not running if having someone else boost it.

o Boost RCU read-side critical sections that must block waiting
for a non-raw spinlock. The URL noted above shows one approach
I was messing with some time back.

o Boost RCU read-side critical sections based on the priority of
tasks doing synchronize_rcu() and call_rcu(). (This was something
Steve Rostedt suggested at OLS.) One thing at a time! ;-)

o Implementing preemption thresholding, as suggested by Bill Huey.
I am taking the coward's way out on this for the moment in order
to improve the odds of getting something useful done (as opposed
to getting something potentially even more useful only half done).

Anyway, the following patch compiles and passes lightweight "smoke" tests.
It almost certainly has fatal flaws -- for, example, I don't see how it
would handle yet another task doing a lock-based priority boost between
the time the task is RCU-boosted and the time it de-boosts itself in

Again, not for inclusion in its present form, but any enlightenment would
be greatly appreciated.

(Thomas, you did ask for this!!!)

Thanx, Paul

Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx> (but not for inclusion)

include/linux/init_task.h | 1 +
include/linux/rcupdate.h | 2 ++
include/linux/sched.h | 3 +++
kernel/rcupreempt.c | 11 +++++++++++
kernel/sched.c | 8 ++++++++
5 files changed, 25 insertions(+)

diff -urpNa -X dontdiff linux-2.6.17-rt7/include/linux/init_task.h linux-2.6.17-rt7-rcubp/include/linux/init_task.h
--- linux-2.6.17-rt7/include/linux/init_task.h 2006-07-27 14:29:55.000000000 -0700
+++ linux-2.6.17-rt7-rcubp/include/linux/init_task.h 2006-07-27 14:34:20.000000000 -0700
@@ -89,6 +89,7 @@ extern struct group_info init_groups;
.prio = MAX_PRIO-20, \
.static_prio = MAX_PRIO-20, \
.normal_prio = MAX_PRIO-20, \
+ .rcu_prio = MAX_PRIO, \
.policy = SCHED_NORMAL, \
.cpus_allowed = CPU_MASK_ALL, \
.mm = NULL, \
diff -urpNa -X dontdiff linux-2.6.17-rt7/include/linux/rcupdate.h linux-2.6.17-rt7-rcubp/include/linux/rcupdate.h
--- linux-2.6.17-rt7/include/linux/rcupdate.h 2006-07-27 14:29:55.000000000 -0700
+++ linux-2.6.17-rt7-rcubp/include/linux/rcupdate.h 2006-07-27 14:34:20.000000000 -0700
@@ -175,6 +175,8 @@ extern int rcu_needs_cpu(int cpu);

#else /* #ifndef CONFIG_PREEMPT_RCU */

+#define RCU_PREEMPT_BOOST_PRIO MAX_USER_RT_PRIO /* Initial boost level. */
#define rcu_qsctr_inc(cpu)
#define rcu_bh_qsctr_inc(cpu)
#define call_rcu_bh(head, rcu) call_rcu(head, rcu)
diff -urpNa -X dontdiff linux-2.6.17-rt7/include/linux/sched.h linux-2.6.17-rt7-rcubp/include/linux/sched.h
--- linux-2.6.17-rt7/include/linux/sched.h 2006-07-27 14:29:55.000000000 -0700
+++ linux-2.6.17-rt7-rcubp/include/linux/sched.h 2006-07-27 14:34:20.000000000 -0700
@@ -851,6 +851,9 @@ struct task_struct {
int oncpu;
int prio, static_prio, normal_prio;
+ int rcu_prio;
struct list_head run_list;
prio_array_t *array;

diff -urpNa -X dontdiff linux-2.6.17-rt7/kernel/rcupreempt.c linux-2.6.17-rt7-rcubp/kernel/rcupreempt.c
--- linux-2.6.17-rt7/kernel/rcupreempt.c 2006-07-27 14:29:55.000000000 -0700
+++ linux-2.6.17-rt7-rcubp/kernel/rcupreempt.c 2006-07-27 14:34:20.000000000 -0700
@@ -147,6 +147,17 @@ rcu_read_lock(void)
smp_mb__after_atomic_inc(); /* might optimize out... */
+ if (unlikely(current->rcu_prio <= RCU_PREEMPT_BOOST_PRIO)) {
+ int new_prio = MAX_PRIO;
+ current->rcu_prio = MAX_PRIO;
+ if (new_prio > current->static_prio)
+ new_prio = current->static_prio;
+ if (new_prio > current->normal_prio)
+ new_prio = current->normal_prio;
+ /* How to account for lock-based prio boost? */
+ rt_mutex_setprio(current, new_prio);
+ }
trace_special((unsigned long) current->rcu_flipctr1,
(unsigned long) current->rcu_flipctr2,
diff -urpNa -X dontdiff linux-2.6.17-rt7/kernel/sched.c linux-2.6.17-rt7-rcubp/kernel/sched.c
--- linux-2.6.17-rt7/kernel/sched.c 2006-07-27 14:29:55.000000000 -0700
+++ linux-2.6.17-rt7-rcubp/kernel/sched.c 2006-07-27 14:58:40.000000000 -0700
@@ -3685,6 +3685,14 @@ asmlinkage void __sched preempt_schedule

+ if (unlikely(current->rcu_read_lock_nesting > 0) &&
+ (current->rcu_prio > RCU_PREEMPT_BOOST_PRIO)) {
+ current->rcu_prio = RCU_PREEMPT_BOOST_PRIO;
+ if (current->rcu_prio < current->prio)
+ rt_mutex_setprio(current, current->rcu_prio);
+ }

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at