Re: [RFC, PATCH, -rt] Early prototype RCU priority-boost patch

From: Paul E. McKenney
Date: Mon Jul 31 2006 - 10:36:05 EST

On Fri, Jul 28, 2006 at 07:50:37PM -0700, Bill Huey wrote:
> On Fri, Jul 28, 2006 at 07:18:29PM -0700, Paul E. McKenney wrote:
> > On Fri, Jul 28, 2006 at 03:27:16PM -0700, Bill Huey wrote:
> > > What is that ? like randomly boosting without tracking which thread is
> > > inside an RCU critical section ?
> >
> > Perhaps a better way to put it would be that a thread preempted in
> > an RCU read-side critical section boosts itself, and tracks the fact
> > that it boosted itself in its tasks structure.
> >
> > The second boost would be from some other task, but if the task had
> > already boosted itself, the de-boosting would already be taken care of
> > at the next rcu_read_unlock() -- but as mentioned earlier in this
> > thread, you only boost someone else if they are not currently running.
> The problem here is that I can't see how it's going to boost the thread
> if the things doing the RCU sync can't track the list of readers. It
> might be record in the trask struct, now what ?

The first boost is performed by the task itself the first time there
is a preemption attempt (or the first time it blocks on a mutex), so
no need to track the list of readers in that case. The trick is that
there is no benefit to boosting someone who is already running -- we
only need to boost the first time they are considering blocking.

If there is a need for a second "boost to the sky" in case of excessively
delayed grace period (or to provide deterministic synchronize_rcu()
latency), then we need a list only of those RCU readers who have attempted
to block at least once thus far in their current RCU read-side critical
section. But I was putting this off until I get the simple case right.
Cowardly of me, I know! ;-)

Finally found Steve Rostedt's PI document (in 2.6.18-rc2), very helpful
(though I suppose I should reserve judgement until after I get this

> > > > 2. RCU reader boosting a lock holder. This ends up being a
> > > > combination of #1 (because the act of blocking on a lock implies
> > > > an "out of nowhere" priority boost) and normal lock boosting.
> > >
> > > Lock holder as in mutex held below and RCU critical section ?
> >
> > Lock holder as in task 0 holds the lock, perhaps in an RCU read-side
> > critical section and perhaps not. Task 1 is in an RCU read-side
> > critical section and attempts to acquire the lock. Task 1 must block,
> > because task 0 still holds the lock. Task 1 must boost itself before
> > blocking, and must donate its boosted priority to task 0.
> Ok (thinking...)
> > > > 3. A call_rcu() or synchronize_rcu() boosting all readers. I am
> > > > not sure we really need this, but in case we do... One would
> > > > need an additional prio_booster for each task to be boosted,
> > > > right? This would seem to require an additional prio_booster
> > > > struct in each task structure.
> > >
> > > This needs a notion of RCU read side ownership to boost those preempted
> > > threads.
> >
> > I am getting the impression that #3 is something to leave aside for now.
> ...
> > The idea is that none of this stuff ever happens except in cases where
> > the RCU read-side critical section blocks, in which case all this is
> ...
> > in the noise compared to the context switch. The sole exception to
> > this is that rcu_read_unlock() must check to see if it has been boosted,
> > and deboost itself if so. I don't particularly like the additional
> > comparison, but it should not be too expensive.
> Oh, blocks as is gets shoved into a wait queue for a PI enabled lock.


> > > Don't know what to think about it other than some kind of tracking or
> > > boosting logic in the per CPU run queue or the task struct itself during
> > > the boost operation. But you're still stuck with the problem of what
> > > to boost and how to find that out during an RCU sync side. It's still
> > > an ownership problem unless Esben can think of another way of getting
> > > around that problem.
> >
> > One idea is to put tasks that block in RCU read-side critical sections
> > on a list -- again, the hope is that the overhead is in the noise compared
> > to the context switch.
> Only way to find out is to try it.

Or to try without it and see what happens.

> > > That's why I suggested a priority ceiling or per CPU priority threshold
> > > tracking (+ CPU binding) the priority of the irq-threads and stuff. It's
> > > a simple hack to restore the cheesy preempt count stuff without having
> > > to revert to invasive ownership tracking for each reader.
> > >
> > > It's just an idea. Maybe it'll be useful to you.
> >
> > Let me make sure I understand what you are suggesting -- sounds to me
> > like a check in preempt_schedule(). If the task to be preempted is
> > higher priority than the ceiling, the preemption request is refused.
> >
> > Or am I missing part of your proposal?
> Something like at all preemption points, cond_resched() and friends (scheduler
> tick) to an additional check against a value in a threads own CPU run queue
> struct to see if it should permit the preemption or not. I'm thinking about
> ways to avoid doing an expensive run queue lock during a task's priority
> manipulation and instead have some other kind of logic orthogonal to that so
> that it can bypass this overhead. A value in a run queue that can be checked
> against in order to prevent a preemption from happen might be able to side
> step the need for a doing a full run queue lock to reorder a tasks priority
> ranking.
> If a ceiling or threshold was defined for RCU (another somewhat complicated
> topic) it could prevent the RCU critical section from preempting other
> than with other SCHED_FIFO tasks at and above that priority, if you choose
> a threshold at that priority. That'll be apart of the runtime configuratio
> of the system. You'd have to cpu_get/put to get that value so that you get
> at it safely, read or write to it, and maybe save and restore that value on
> entry and exit respectively. You'll also have to set a field in the task
> struct to prevent it from migration to another CPU and make sure that's
> modifying the right stuff on the right CPU.
> It's a possible solution to a rather difficult problem. What do you think ?
> too much of a hack ?

I am not sure -- seems to be a dual approach to boosting the RCU reader's
priority in the preemption case. I suspect that a real priority boost
would still be needed in the case where the RCU reader blocks on a mutex,
since we need the priority inheritance to happen in that case, right?

> (I'm into -rt development again after a good OLS and I'm trying to get my
> kernel development up and going so that I can help out)

Sounds very good!!!

Thanx, Paul
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at