Re: RT Mutex patch and tester [PREEMPT_RT]

From: Esben Nielsen
Date: Fri Jan 27 2006 - 10:19:28 EST


I have patched against 2.6.15-rt15 and I have found a hyperthreaded P4
machine. It works fine on that one.

Esben

On Mon, 23 Jan 2006, Esben Nielsen wrote:

> On Mon, 23 Jan 2006, Steven Rostedt wrote:
>
> > On Mon, 2006-01-23 at 10:33 +0100, Esben Nielsen wrote:
> > > On Sun, 22 Jan 2006, Bill Huey wrote:
> > >
> > > > On Mon, Jan 23, 2006 at 01:20:12AM +0100, Esben Nielsen wrote:
> > > > > Here is the problem:
> > > > >
> > > > > Task B (non-RT) takes BKL. It then takes mutex 1. Then B
> > > > > tries to lock mutex 2, which is owned by task C. B goes blocks and releases the
> > > > > BKL. Our RT task A comes along and tries to get 1. It boosts task B
> > > > > which boosts task C which releases mutex 2. Now B can continue? No, it has
> > > > > to reaquire BKL! The netto effect is that our RT task A waits for BKL to
> > > > > be released without ever calling into a module using BKL. But just because
> > > > > somebody in some non-RT code called into a module otherwise considered
> > > > > safe for RT usage with BKL held, A must wait on BKL!
> > > >
> > > > True, that's major suckage, but I can't name a single place in the kernel that
> > > > does that.
> > >
> > > Sounds good. But someone might put it in...
> >
> > Hmm, I wouldn't be surprised if this is done somewhere in the VFS layer.
> >
> > >
> > > > Remember, BKL is now preemptible so the place that it might sleep
> > > > similar
> > > > to the above would be in spinlock_t definitions.
> > > I can't see that from how it works. It is explicitly made such that you
> > > are allowed to use semaphores with BKL held - and such that the BKL is
> > > released if you do.
> >
> > Correct. I hope you didn't remove my comment in the rt.c about BKL
> > being a PITA :) (Ingo was nice enough to change my original patch to use
> > the acronym.)
>
> I left it there it seems :-)
>
> >
> > >
> > > > But BKL is held across schedules()s
> > > > so that the BKL semantics are preserved.
> > > Only for spinlock_t now rt_mutex operation, not for semaphore/mutex
> > > operations.
> > > > Contending under a priority inheritance
> > > > operation isn't too much of a problem anyways since the use of it already
> > > > makes that
> > > > path indeterminant.
> > > The problem is that you might hit BKL because of what some other low
> > > priority task does, thus making your RT code indeterministic.
> >
> > I disagree here. The fact that you grab a semaphore that may also be
> > grabbed by a path while holding the BKL means that grabbing that
> > semaphore may be blocked on the BKL too. So the length of grabbing a
> > semaphore that can be grabbed while also holding the BKL is the length
> > of the critical section of the semaphore + the length of the longest BKL
> > hold.
> Exactly. What is "the length of the longest BKL hold" ? (see below).
>
> >
> > Just don't let your RT tasks grab semaphores that can be grabbed while
> > also holding the BKL :)
>
> How are you to _know_ that. Even though your code or any code you
> call or any code called from code you call haven't changed, this situation
> can arise!
>
> >
> > But the main point is that it is still deterministic. Just that it may
> > be longer than one thinks.
> >
> I don't consider "the length of the longest BKL hold" deterministic.
> People might traverse all kinds of weird lists and datastructures while
> holding BKL.
>
> > >
> > > > Even under contention, a higher priority task above A can still
> > > > run since the kernel is preemptive now even when manipulating BKL.
> > >
> > > No, A waits for BKL because it waits for B which waits for the BKL.
> >
> > Right.
> >
> > -- Steve
> >
> > PS. I might actually get around to testing your patch today :) That is,
> > if -rt12 passes all my tests.
> >
>
> Sounds nice :-) I cross my fingers...
>
> Esben
>
>
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> >
>
>
diff -upr linux-2.6.15-rt15-orig/fs/proc/array.c linux-2.6.15-rt15-pipatch/fs/proc/array.c
--- linux-2.6.15-rt15-orig/fs/proc/array.c 2006-01-24 18:50:37.000000000 +0100
+++ linux-2.6.15-rt15-pipatch/fs/proc/array.c 2006-01-24 18:56:07.000000000 +0100
@@ -295,6 +295,14 @@ static inline char *task_cap(struct task
cap_t(p->cap_effective));
}

+
+static char *show_blocked_on(task_t *task, char *buffer)
+{
+ pid_t pid = get_blocked_on(task);
+ return buffer + sprintf(buffer,"BlckOn: %d\n",pid);
+}
+
+
int proc_pid_status(struct task_struct *task, char * buffer)
{
char * orig = buffer;
@@ -313,6 +321,7 @@ int proc_pid_status(struct task_struct *
#if defined(CONFIG_ARCH_S390)
buffer = task_show_regs(task, buffer);
#endif
+ buffer = show_blocked_on(task,buffer);
return buffer - orig;
}

diff -upr linux-2.6.15-rt15-orig/include/linux/rt_lock.h linux-2.6.15-rt15-pipatch/include/linux/rt_lock.h
--- linux-2.6.15-rt15-orig/include/linux/rt_lock.h 2006-01-24 18:50:37.000000000 +0100
+++ linux-2.6.15-rt15-pipatch/include/linux/rt_lock.h 2006-01-24 18:56:07.000000000 +0100
@@ -36,6 +36,7 @@ struct rt_mutex {
unsigned long acquire_eip;
char *name, *file;
int line;
+ int verbose;
# endif
# ifdef CONFIG_DEBUG_PREEMPT
int was_preempt_off;
@@ -67,7 +68,7 @@ struct rt_mutex_waiter {

#ifdef CONFIG_DEBUG_DEADLOCKS
# define __RT_MUTEX_DEADLOCK_DETECT_INITIALIZER(lockname) \
- , .name = #lockname, .file = __FILE__, .line = __LINE__
+ , .name = #lockname, .file = __FILE__, .line = __LINE__, .verbose =0
#else
# define __RT_MUTEX_DEADLOCK_DETECT_INITIALIZER(lockname)
#endif
diff -upr linux-2.6.15-rt15-orig/include/linux/sched.h linux-2.6.15-rt15-pipatch/include/linux/sched.h
--- linux-2.6.15-rt15-orig/include/linux/sched.h 2006-01-24 18:50:37.000000000 +0100
+++ linux-2.6.15-rt15-pipatch/include/linux/sched.h 2006-01-24 18:56:07.000000000 +0100
@@ -1652,6 +1652,8 @@ extern void recalc_sigpending(void);

extern void signal_wake_up(struct task_struct *t, int resume_stopped);

+extern pid_t get_blocked_on(task_t *task);
+
/*
* Wrappers for p->thread_info->cpu access. No-op on UP.
*/
diff -upr linux-2.6.15-rt15-orig/init/main.c linux-2.6.15-rt15-pipatch/init/main.c
--- linux-2.6.15-rt15-orig/init/main.c 2006-01-24 18:50:37.000000000 +0100
+++ linux-2.6.15-rt15-pipatch/init/main.c 2006-01-24 18:56:07.000000000 +0100
@@ -616,6 +616,12 @@ static void __init do_initcalls(void)
printk(KERN_WARNING "error in initcall at 0x%p: "
"returned with %s\n", *call, msg);
}
+ if (initcall_debug) {
+ printk(KERN_DEBUG "Returned from initcall 0x%p", *call);
+ print_fn_descriptor_symbol(": %s()", (unsigned long) *call);
+ printk("\n");
+ }
+
}

/* Make sure there is no pending stuff from the initcall sequence */
diff -upr linux-2.6.15-rt15-orig/kernel/rt.c linux-2.6.15-rt15-pipatch/kernel/rt.c
--- linux-2.6.15-rt15-orig/kernel/rt.c 2006-01-24 18:50:37.000000000 +0100
+++ linux-2.6.15-rt15-pipatch/kernel/rt.c 2006-01-24 18:56:07.000000000 +0100
@@ -36,7 +36,10 @@
* (also by Steven Rostedt)
* - Converted single pi_lock to individual task locks.
*
+ * By Esben Nielsen:
+ * Doing priority inheritance with help of the scheduler.
*/
+
#include <linux/config.h>
#include <linux/rt_lock.h>
#include <linux/sched.h>
@@ -58,18 +61,26 @@
* To keep from having a single lock for PI, each task and lock
* has their own locking. The order is as follows:
*
+ * lock->wait_lock -> sometask->pi_lock
+ * You should only hold one wait_lock and one pi_lock
* blocked task->pi_lock -> lock->wait_lock -> owner task->pi_lock.
*
- * This is safe since a owner task should never block on a lock that
- * is owned by a blocking task. Otherwise you would have a deadlock
- * in the normal system.
- * The same goes for the locks. A lock held by one task, should not be
- * taken by task that holds a lock that is blocking this lock's owner.
+ * lock->wait_lock protects everything inside the lock and all the waiters
+ * on lock->wait_list.
+ * sometask->pi_lock protects everything on task-> related to the rt_mutex.
+ *
+ * Invariants - must be true when unlock lock->wait_lock:
+ * If lock->wait_list is non-empty
+ * 1) lock_owner(lock) points to a valid thread.
+ * 2) The first and only the first waiter on the list must be on
+ * lock_owner(lock)->task->pi_waiters.
+ *
+ * A waiter struct is on the lock->wait_list iff waiter->ti!=NULL.
*
- * A task that is about to grab a lock is first considered to be a
- * blocking task, even if the task successfully acquires the lock.
- * This is because the taking of the locks happen before the
- * task becomes the owner.
+ * Strategy for boosting lock chain:
+ * task A blocked on lock 1 owned by task B blocked on lock 2 etc..
+ * A sets B's prio up and wakes B. B try to get lock 2 again and fails.
+ * B therefore boost C.
*/

/*
@@ -117,6 +128,7 @@
* This flag is good for debugging the PI code - it makes all tasks
* in the system fall under PI handling. Normally only SCHED_FIFO/RR
* tasks are PI-handled:
+ *
*/
#define ALL_TASKS_PI 0

@@ -132,6 +144,19 @@
# define __CALLER0__
#endif

+int rt_mutex_debug = 0;
+
+#ifdef CONFIG_PREEMPT_RT
+static int is_kernel_lock(struct rt_mutex *lock)
+{
+ return (lock == &kernel_sem.lock);
+
+}
+#else
+#define is_kernel_lock(lock) (0)
+#endif
+
+
#ifdef CONFIG_DEBUG_DEADLOCKS
/*
* We need a global lock when we walk through the multi-process
@@ -311,7 +336,7 @@ void check_preempt_wakeup(struct task_st
}
}

-static inline void
+static void
account_mutex_owner_down(struct task_struct *task, struct rt_mutex *lock)
{
if (task->lock_count >= MAX_LOCK_STACK) {
@@ -325,7 +350,7 @@ account_mutex_owner_down(struct task_str
task->lock_count++;
}

-static inline void
+static void
account_mutex_owner_up(struct task_struct *task)
{
if (!task->lock_count) {
@@ -390,6 +415,21 @@ static void printk_lock(struct rt_mutex
}
}

+static void debug_lock(struct rt_mutex *lock,
+ const char *fmt,...)
+{
+ if(rt_mutex_debug && lock->verbose) {
+ va_list args;
+ printk_task(current);
+
+ va_start(args, fmt);
+ vprintk(fmt, args);
+ va_end(args);
+ printk_lock(lock, 1);
+ }
+}
+
+
static void printk_waiter(struct rt_mutex_waiter *w)
{
printk("-------------------------\n");
@@ -534,10 +574,9 @@ static int check_deadlock(struct rt_mute
* Special-case: the BKL self-releases at schedule()
* time so it can never deadlock:
*/
-#ifdef CONFIG_PREEMPT_RT
- if (lock == &kernel_sem.lock)
+ if (is_kernel_lock(lock))
return 0;
-#endif
+
ti = lock_owner(lock);
if (!ti)
return 0;
@@ -562,13 +601,8 @@ static int check_deadlock(struct rt_mute
trace_local_irq_disable(ti);
return 0;
}
-#ifdef CONFIG_PREEMPT_RT
- /*
- * Skip the BKL:
- */
- if (lockblk == &kernel_sem.lock)
+ if(is_kernel_lock(lockblk))
return 0;
-#endif
/*
* Ugh, something corrupted the lock data structure?
*/
@@ -656,7 +690,7 @@ restart:
list_del_init(curr);
trace_unlock_irqrestore(&trace_lock, flags, ti);

- if (lock == &kernel_sem.lock) {
+ if (is_kernel_lock(lock)) {
printk("BUG: %s/%d, BKL held at task exit time!\n",
task->comm, task->pid);
printk("BKL acquired at: ");
@@ -724,28 +758,14 @@ restart:
return err;
}

-#endif
-
-#if ALL_TASKS_PI && defined(CONFIG_DEBUG_DEADLOCKS)
-
-static void
-check_pi_list_present(struct rt_mutex *lock, struct rt_mutex_waiter *waiter,
- struct thread_info *old_owner)
+#else /* ifdef CONFIG_DEBUG_DEADLOCKS */
+static inline void debug_lock(struct rt_mutex *lock,
+ const char *fmt,...)
{
- struct rt_mutex_waiter *w;
-
- _raw_spin_lock(&old_owner->task->pi_lock);
- TRACE_WARN_ON_LOCKED(plist_node_empty(&waiter->pi_list));
-
- plist_for_each_entry(w, &old_owner->task->pi_waiters, pi_list) {
- if (w == waiter)
- goto ok;
- }
- TRACE_WARN_ON_LOCKED(1);
-ok:
- _raw_spin_unlock(&old_owner->task->pi_lock);
- return;
}
+#endif /* else CONFIG_DEBUG_DEADLOCKS */
+
+#if ALL_TASKS_PI && defined(CONFIG_DEBUG_DEADLOCKS)

static void
check_pi_list_empty(struct rt_mutex *lock, struct thread_info *old_owner)
@@ -781,274 +801,115 @@ check_pi_list_empty(struct rt_mutex *loc

#endif

-/*
- * Move PI waiters of this lock to the new owner:
- */
-static void
-change_owner(struct rt_mutex *lock, struct thread_info *old_owner,
- struct thread_info *new_owner)
+static inline int boosting_waiter(struct rt_mutex_waiter *waiter)
{
- struct rt_mutex_waiter *w, *tmp;
- int requeued = 0, sum = 0;
-
- if (old_owner == new_owner)
- return;
-
- SMP_TRACE_BUG_ON_LOCKED(!spin_is_locked(&old_owner->task->pi_lock));
- SMP_TRACE_BUG_ON_LOCKED(!spin_is_locked(&new_owner->task->pi_lock));
- plist_for_each_entry_safe(w, tmp, &old_owner->task->pi_waiters, pi_list) {
- if (w->lock == lock) {
- trace_special_pid(w->ti->task->pid, w->ti->task->prio, w->ti->task->normal_prio);
- plist_del(&w->pi_list);
- w->pi_list.prio = w->ti->task->prio;
- plist_add(&w->pi_list, &new_owner->task->pi_waiters);
- requeued++;
- }
- sum++;
- }
- trace_special(sum, requeued, 0);
+ return ALL_TASKS_PI || rt_prio(waiter->list.prio);
}

-int pi_walk, pi_null, pi_prio, pi_initialized;
-
-/*
- * The lock->wait_lock and p->pi_lock must be held.
- */
-static void pi_setprio(struct rt_mutex *lock, struct task_struct *task, int prio)
+static int calc_pi_prio(task_t *task)
{
- struct rt_mutex *l = lock;
- struct task_struct *p = task;
- /*
- * We don't want to release the parameters locks.
- */
-
- if (unlikely(!p->pid)) {
- pi_null++;
- return;
+ int prio = task->normal_prio;
+ if(!plist_head_empty(&task->pi_waiters)) {
+ struct rt_mutex_waiter *waiter =
+ plist_first_entry(&task->pi_waiters, struct rt_mutex_waiter, pi_list);
+ prio = min(waiter->pi_list.prio,prio);
}

- SMP_TRACE_BUG_ON_LOCKED(!spin_is_locked(&lock->wait_lock));
- SMP_TRACE_BUG_ON_LOCKED(!spin_is_locked(&p->pi_lock));
-#ifdef CONFIG_DEBUG_DEADLOCKS
- pi_prio++;
- if (p->policy != SCHED_NORMAL && prio > normal_prio(p)) {
- TRACE_OFF();
-
- printk("huh? (%d->%d??)\n", p->prio, prio);
- printk("owner:\n");
- printk_task(p);
- printk("\ncurrent:\n");
- printk_task(current);
- printk("\nlock:\n");
- printk_lock(lock, 1);
- dump_stack();
- trace_local_irq_disable(ti);
- }
-#endif
- /*
- * If the task is blocked on some other task then boost that
- * other task (or tasks) too:
- */
- for (;;) {
- struct rt_mutex_waiter *w = p->blocked_on;
-#ifdef CONFIG_DEBUG_DEADLOCKS
- int was_rt = rt_task(p);
-#endif
-
- mutex_setprio(p, prio);
-
- /*
- * The BKL can really be a pain. It can happen where the
- * BKL is being held by one task that is just about to
- * block on another task that is waiting for the BKL.
- * This isn't a deadlock, since the BKL is released
- * when the task goes to sleep. This also means that
- * all holders of the BKL are not blocked, or are just
- * about to be blocked.
- *
- * Another side-effect of this is that there's a small
- * window where the spinlocks are not held, and the blocked
- * process hasn't released the BKL. So if we are going
- * to boost the owner of the BKL, stop after that,
- * since that owner is either running, or about to sleep
- * but don't go any further or we are in a loop.
- */
- if (!w || unlikely(p->lock_depth >= 0))
- break;
- /*
- * If the task is blocked on a lock, and we just made
- * it RT, then register the task in the PI list and
- * requeue it to the wait list:
- */
-
- /*
- * Don't unlock the original lock->wait_lock
- */
- if (l != lock)
- _raw_spin_unlock(&l->wait_lock);
- l = w->lock;
- TRACE_BUG_ON_LOCKED(!lock);
+ return prio;

-#ifdef CONFIG_PREEMPT_RT
- /*
- * The current task that is blocking can also the one
- * holding the BKL, and blocking on a task that wants
- * it. So if it were to get this far, we would deadlock.
- */
- if (unlikely(l == &kernel_sem.lock) && lock_owner(l) == current_thread_info()) {
- /*
- * No locks are held for locks, so fool the unlocking code
- * by thinking the last lock was the original.
- */
- l = lock;
- break;
- }
-#endif
-
- if (l != lock)
- _raw_spin_lock(&l->wait_lock);
-
- TRACE_BUG_ON_LOCKED(!lock_owner(l));
-
- if (!plist_node_empty(&w->pi_list)) {
- TRACE_BUG_ON_LOCKED(!was_rt && !ALL_TASKS_PI && !rt_task(p));
- /*
- * If the task is blocked on a lock, and we just restored
- * it from RT to non-RT then unregister the task from
- * the PI list and requeue it to the wait list.
- *
- * (TODO: this can be unfair to SCHED_NORMAL tasks if they
- * get PI handled.)
- */
- plist_del(&w->pi_list);
- } else
- TRACE_BUG_ON_LOCKED((ALL_TASKS_PI || rt_task(p)) && was_rt);
-
- if (ALL_TASKS_PI || rt_task(p)) {
- w->pi_list.prio = prio;
- plist_add(&w->pi_list, &lock_owner(l)->task->pi_waiters);
- }
-
- plist_del(&w->list);
- w->list.prio = prio;
- plist_add(&w->list, &l->wait_list);
-
- pi_walk++;
-
- if (p != task)
- _raw_spin_unlock(&p->pi_lock);
-
- p = lock_owner(l)->task;
- TRACE_BUG_ON_LOCKED(!p);
- _raw_spin_lock(&p->pi_lock);
- /*
- * If the dependee is already higher-prio then
- * no need to boost it, and all further tasks down
- * the dependency chain are already boosted:
- */
- if (p->prio <= prio)
- break;
- }
- if (l != lock)
- _raw_spin_unlock(&l->wait_lock);
- if (p != task)
- _raw_spin_unlock(&p->pi_lock);
}

-/*
- * Change priority of a task pi aware
- *
- * There are several aspects to consider:
- * - task is priority boosted
- * - task is blocked on a mutex
- *
- */
-void pi_changeprio(struct task_struct *p, int prio)
+static void fix_prio(task_t *task)
{
- unsigned long flags;
- int oldprio;
-
- spin_lock_irqsave(&p->pi_lock,flags);
- if (p->blocked_on)
- spin_lock(&p->blocked_on->lock->wait_lock);
-
- oldprio = p->normal_prio;
- if (oldprio == prio)
- goto out;
-
- /* Set normal prio in any case */
- p->normal_prio = prio;
-
- /* Check, if we can safely lower the priority */
- if (prio > p->prio && !plist_head_empty(&p->pi_waiters)) {
- struct rt_mutex_waiter *w;
- w = plist_first_entry(&p->pi_waiters,
- struct rt_mutex_waiter, pi_list);
- if (w->ti->task->prio < prio)
- prio = w->ti->task->prio;
+ int prio = calc_pi_prio(task);
+ if(task->prio > prio) {
+ /* Boost him */
+ mutex_setprio(task,prio);
+ if(task->blocked_on) {
+ /* Let it run to boost it's lock */
+ wake_up_process_mutex(task);
+ }
+ }
+ else if(task->prio < prio) {
+ /* Priority too high */
+ if(task->blocked_on) {
+ /* Let it run to unboost it's lock */
+ wake_up_process_mutex(task);
+ }
+ else {
+ mutex_setprio(task,prio);
+ }
}
-
- if (prio == p->prio)
- goto out;
-
- /* Is task blocked on a mutex ? */
- if (p->blocked_on)
- pi_setprio(p->blocked_on->lock, p, prio);
- else
- mutex_setprio(p, prio);
- out:
- if (p->blocked_on)
- spin_unlock(&p->blocked_on->lock->wait_lock);
-
- spin_unlock_irqrestore(&p->pi_lock, flags);
-
}

+int pi_walk, pi_null, pi_prio, pi_initialized;
+
/*
* This is called with both the waiter->task->pi_lock and
* lock->wait_lock held.
*/
static void
task_blocks_on_lock(struct rt_mutex_waiter *waiter, struct thread_info *ti,
- struct rt_mutex *lock __EIP_DECL__)
+ struct rt_mutex *lock, int state __EIP_DECL__)
{
+ struct rt_mutex_waiter *old_first;
struct task_struct *task = ti->task;
#ifdef CONFIG_DEBUG_DEADLOCKS
check_deadlock(lock, 0, ti, eip);
/* mark the current thread as blocked on the lock */
waiter->eip = eip;
#endif
+ SMP_TRACE_BUG_ON_LOCKED(!spin_is_locked(&lock->wait_lock));
+ SMP_TRACE_BUG_ON_LOCKED(spin_is_locked(&task->pi_lock));
+
+ if(plist_head_empty(&lock->wait_list)) {
+ old_first = NULL;
+ }
+ else {
+ old_first = plist_first_entry(&lock->wait_list, struct rt_mutex_waiter, list);
+ if(!boosting_waiter(old_first)) {
+ old_first = NULL;
+ }
+ }
+
+
+ _raw_spin_lock(&task->pi_lock);
task->blocked_on = waiter;
waiter->lock = lock;
waiter->ti = ti;
- plist_node_init(&waiter->pi_list, task->prio);
- /*
- * Add SCHED_NORMAL tasks to the end of the waitqueue (FIFO):
- */
- SMP_TRACE_BUG_ON_LOCKED(!spin_is_locked(&task->pi_lock));
- SMP_TRACE_BUG_ON_LOCKED(!spin_is_locked(&lock->wait_lock));
-#if !ALL_TASKS_PI
- if ((!rt_task(task) &&
- !(lock->mutex_attr & FUTEX_ATTR_PRIORITY_INHERITANCE))) {
- plist_add(&waiter->list, &lock->wait_list);
- set_lock_owner_pending(lock);
- return;
+
+ {
+ /* Fixup the prio of the (current) task here while we have the
+ pi_lock */
+ int prio = calc_pi_prio(task);
+ if(prio!=task->prio) {
+ mutex_setprio(task,prio);
+ }
}
-#endif
- _raw_spin_lock(&lock_owner(lock)->task->pi_lock);
- plist_add(&waiter->pi_list, &lock_owner(lock)->task->pi_waiters);
- /*
- * Add RT tasks to the head:
- */
+
+ plist_node_init(&waiter->list, task->prio);
plist_add(&waiter->list, &lock->wait_list);
- set_lock_owner_pending(lock);
- /*
- * If the waiter has higher priority than the owner
- * then temporarily boost the owner:
- */
- if (task->prio < lock_owner(lock)->task->prio)
- pi_setprio(lock, lock_owner(lock)->task, task->prio);
- _raw_spin_unlock(&lock_owner(lock)->task->pi_lock);
+ set_task_state(task, state);
+ _raw_spin_unlock(&task->pi_lock);
+
+ set_lock_owner_pending(lock);
+
+ if(waiter ==
+ plist_first_entry(&lock->wait_list, struct rt_mutex_waiter, list)
+ && boosting_waiter(waiter)) {
+ task_t *owner = lock_owner(lock)->task;
+
+ plist_node_init(&waiter->pi_list, task->prio);
+
+ _raw_spin_lock(&owner->pi_lock);
+ if(old_first) {
+ plist_del(&old_first->pi_list);
+ }
+ plist_add(&waiter->pi_list, &owner->pi_waiters);
+ fix_prio(owner);
+
+ _raw_spin_unlock(&owner->pi_lock);
+ }
}

/*
@@ -1068,6 +929,7 @@ static void __init_rt_mutex(struct rt_mu
lock->name = name;
lock->file = file;
lock->line = line;
+ lock->verbose = 0;
#endif
#ifdef CONFIG_DEBUG_PREEMPT
lock->was_preempt_off = 0;
@@ -1085,20 +947,48 @@ EXPORT_SYMBOL(__init_rwsem);
#endif

/*
- * This must be called with both the old_owner and new_owner pi_locks held.
- * As well as the lock->wait_lock.
+ * This must be called with the lock->wait_lock held.
+ * Must: new_owner!=NULL
+ * Likely: old_owner==NULL
*/
-static inline
+static
void set_new_owner(struct rt_mutex *lock, struct thread_info *old_owner,
struct thread_info *new_owner __EIP_DECL__)
{
+ SMP_TRACE_BUG_ON_LOCKED(spin_is_locked(&old_owner->task->pi_lock));
+ SMP_TRACE_BUG_ON_LOCKED(spin_is_locked(&new_owner->task->pi_lock));
+ SMP_TRACE_BUG_ON_LOCKED(!spin_is_locked(&lock->wait_lock));
+
if (new_owner)
trace_special_pid(new_owner->task->pid, new_owner->task->prio, 0);
- if (unlikely(old_owner))
- change_owner(lock, old_owner, new_owner);
+ if(old_owner) {
+ account_mutex_owner_up(old_owner->task);
+ }
+#ifdef CONFIG_DEBUG_DEADLOCKS
+ if (trace_on && unlikely(old_owner)) {
+ TRACE_WARN_ON_LOCKED(list_empty(&lock->held_list));
+ list_del_init(&lock->held_list);
+ }
+#endif
lock->owner = new_owner;
- if (!plist_head_empty(&lock->wait_list))
- set_lock_owner_pending(lock);
+ if (!plist_head_empty(&lock->wait_list)) {
+ struct rt_mutex_waiter *next =
+ plist_first_entry(&lock->wait_list,
+ struct rt_mutex_waiter, list);
+ if(boosting_waiter(next)) {
+ if(old_owner) {
+ _raw_spin_lock(&old_owner->task->pi_lock);
+ plist_del(&next->pi_list);
+ _raw_spin_unlock(&old_owner->task->pi_lock);
+ }
+ _raw_spin_lock(&new_owner->task->pi_lock);
+ plist_add(&next->pi_list,
+ &new_owner->task->pi_waiters);
+ set_lock_owner_pending(lock);
+ _raw_spin_unlock(&new_owner->task->pi_lock);
+ }
+ }
+
#ifdef CONFIG_DEBUG_DEADLOCKS
if (trace_on) {
TRACE_WARN_ON_LOCKED(!list_empty(&lock->held_list));
@@ -1109,6 +999,36 @@ void set_new_owner(struct rt_mutex *lock
account_mutex_owner_down(new_owner->task, lock);
}

+
+static void remove_waiter(struct rt_mutex *lock,
+ struct rt_mutex_waiter *waiter,
+ int fixprio)
+{
+ task_t *owner = lock_owner(lock) ? lock_owner(lock)->task : NULL;
+ int first = (waiter==plist_first_entry(&lock->wait_list,
+ struct rt_mutex_waiter, list));
+
+ plist_del(&waiter->list);
+ if(first && owner) {
+ _raw_spin_lock(&owner->pi_lock);
+ if(boosting_waiter(waiter)) {
+ plist_del(&waiter->pi_list);
+ }
+ if(!plist_head_empty(&lock->wait_list)) {
+ struct rt_mutex_waiter *next =
+ plist_first_entry(&lock->wait_list,
+ struct rt_mutex_waiter, list);
+ if(boosting_waiter(next)) {
+ plist_add(&next->pi_list, &owner->pi_waiters);
+ }
+ }
+ if(fixprio) {
+ fix_prio(owner);
+ }
+ _raw_spin_unlock(&owner->pi_lock);
+ }
+}
+
/*
* handle the lock release when processes blocked on it that can now run
* - the spinlock must be held by the caller
@@ -1123,70 +1043,36 @@ pick_new_owner(struct rt_mutex *lock, st
struct thread_info *new_owner;

SMP_TRACE_BUG_ON_LOCKED(!spin_is_locked(&lock->wait_lock));
+ SMP_TRACE_BUG_ON_LOCKED(spin_is_locked(&old_owner->task->pi_lock));
+
/*
* Get the highest prio one:
*
* (same-prio RT tasks go FIFO)
*/
waiter = plist_first_entry(&lock->wait_list, struct rt_mutex_waiter, list);
-
-#ifdef CONFIG_SMP
- try_again:
-#endif
+ remove_waiter(lock,waiter,0);
trace_special_pid(waiter->ti->task->pid, waiter->ti->task->prio, 0);

-#if ALL_TASKS_PI
- check_pi_list_present(lock, waiter, old_owner);
-#endif
new_owner = waiter->ti;
- /*
- * The new owner is still blocked on this lock, so we
- * must release the lock->wait_lock before grabing
- * the new_owner lock.
- */
- _raw_spin_unlock(&lock->wait_lock);
- _raw_spin_lock(&new_owner->task->pi_lock);
- _raw_spin_lock(&lock->wait_lock);
- /*
- * In this split second of releasing the lock, a high priority
- * process could have come along and blocked as well.
- */
-#ifdef CONFIG_SMP
- waiter = plist_first_entry(&lock->wait_list, struct rt_mutex_waiter, list);
- if (unlikely(waiter->ti != new_owner)) {
- _raw_spin_unlock(&new_owner->task->pi_lock);
- goto try_again;
- }
-#ifdef CONFIG_PREEMPT_RT
- /*
- * Once again the BKL comes to play. Since the BKL can be grabbed and released
- * out of the normal P1->L1->P2 order, there's a chance that someone has the
- * BKL owner's lock and is waiting on the new owner lock.
- */
- if (unlikely(lock == &kernel_sem.lock)) {
- if (!_raw_spin_trylock(&old_owner->task->pi_lock)) {
- _raw_spin_unlock(&new_owner->task->pi_lock);
- goto try_again;
- }
- } else
-#endif
-#endif
- _raw_spin_lock(&old_owner->task->pi_lock);
-
- plist_del(&waiter->list);
- plist_del(&waiter->pi_list);
- waiter->pi_list.prio = waiter->ti->task->prio;

set_new_owner(lock, old_owner, new_owner __W_EIP__(waiter));
+
+ _raw_spin_lock(&new_owner->task->pi_lock);
/* Don't touch waiter after ->task has been NULLed */
mb();
waiter->ti = NULL;
new_owner->task->blocked_on = NULL;
- TRACE_WARN_ON(save_state != lock->save_state);
-
- _raw_spin_unlock(&old_owner->task->pi_lock);
+#ifdef CAPTURE_LOCK
+ if (!is_kernel_lock(lock)) {
+ new_owner->task->rt_flags |= RT_PENDOWNER;
+ new_owner->task->pending_owner = lock;
+ }
+#endif
_raw_spin_unlock(&new_owner->task->pi_lock);

+ TRACE_WARN_ON(save_state != lock->save_state);
+
return new_owner;
}

@@ -1217,11 +1103,41 @@ static inline void init_lists(struct rt_
}
#endif
#ifdef CONFIG_DEBUG_DEADLOCKS
- if (!lock->held_list.prev && !lock->held_list.next)
+ if (!lock->held_list.prev && !lock->held_list.next) {
INIT_LIST_HEAD(&lock->held_list);
+ lock->verbose = 0;
+ }
#endif
}

+
+static void remove_pending_owner_nolock(task_t *owner)
+{
+ owner->rt_flags &= ~RT_PENDOWNER;
+ owner->pending_owner = NULL;
+}
+
+static void remove_pending_owner(task_t *owner)
+{
+ _raw_spin_lock(&owner->pi_lock);
+ remove_pending_owner_nolock(owner);
+ _raw_spin_unlock(&owner->pi_lock);
+}
+
+int task_is_pending_owner_nolock(struct thread_info *owner,
+ struct rt_mutex *lock)
+{
+ return (lock_owner(lock) == owner) &&
+ (owner->task->pending_owner == lock);
+}
+int task_is_pending_owner(struct thread_info *owner, struct rt_mutex *lock)
+{
+ int res;
+ _raw_spin_lock(&owner->task->pi_lock);
+ res = task_is_pending_owner_nolock(owner,lock);
+ _raw_spin_unlock(&owner->task->pi_lock);
+ return res;
+}
/*
* Try to grab a lock, and if it is owned but the owner
* hasn't woken up yet, see if we can steal it.
@@ -1233,6 +1149,8 @@ static int __grab_lock(struct rt_mutex *
{
#ifndef CAPTURE_LOCK
return 0;
+#else
+ int res = 0;
#endif
/*
* The lock is owned, but now test to see if the owner
@@ -1241,111 +1159,36 @@ static int __grab_lock(struct rt_mutex *

TRACE_BUG_ON_LOCKED(!owner);

+ _raw_spin_lock(&owner->pi_lock);
+
/* The owner is pending on a lock, but is it this lock? */
if (owner->pending_owner != lock)
- return 0;
+ goto out_unlock;

/*
* There's an owner, but it hasn't woken up to take the lock yet.
* See if we should steal it from him.
*/
if (task->prio > owner->prio)
- return 0;
-#ifdef CONFIG_PREEMPT_RT
+ goto out_unlock;
+
/*
* The BKL is a PITA. Don't ever steal it
*/
- if (lock == &kernel_sem.lock)
- return 0;
-#endif
+ if (is_kernel_lock(lock))
+ goto out_unlock;
+
/*
* This task is of higher priority than the current pending
* owner, so we may steal it.
*/
- owner->rt_flags &= ~RT_PENDOWNER;
- owner->pending_owner = NULL;
-
-#ifdef CONFIG_DEBUG_DEADLOCKS
- /*
- * This task will be taking the ownership away, and
- * when it does, the lock can't be on the held list.
- */
- if (trace_on) {
- TRACE_WARN_ON_LOCKED(list_empty(&lock->held_list));
- list_del_init(&lock->held_list);
- }
-#endif
- account_mutex_owner_up(owner);
-
- return 1;
-}
-
-/*
- * Bring a task from pending ownership to owning a lock.
- *
- * Return 0 if we secured it, otherwise non-zero if it was
- * stolen.
- */
-static int
-capture_lock(struct rt_mutex_waiter *waiter, struct thread_info *ti,
- struct task_struct *task)
-{
- struct rt_mutex *lock = waiter->lock;
- struct thread_info *old_owner;
- unsigned long flags;
- int ret = 0;
-
-#ifndef CAPTURE_LOCK
- return 0;
-#endif
-#ifdef CONFIG_PREEMPT_RT
- /*
- * The BKL is special, we always get it.
- */
- if (lock == &kernel_sem.lock)
- return 0;
-#endif
-
- trace_lock_irqsave(&trace_lock, flags, ti);
- /*
- * We are no longer blocked on the lock, so we are considered a
- * owner. So we must grab the lock->wait_lock first.
- */
- _raw_spin_lock(&lock->wait_lock);
- _raw_spin_lock(&task->pi_lock);
-
- if (!(task->rt_flags & RT_PENDOWNER)) {
- /*
- * Someone else stole it.
- */
- old_owner = lock_owner(lock);
- TRACE_BUG_ON_LOCKED(old_owner == ti);
- if (likely(!old_owner) || __grab_lock(lock, task, old_owner->task)) {
- /* we got it back! */
- if (old_owner) {
- _raw_spin_lock(&old_owner->task->pi_lock);
- set_new_owner(lock, old_owner, ti __W_EIP__(waiter));
- _raw_spin_unlock(&old_owner->task->pi_lock);
- } else
- set_new_owner(lock, old_owner, ti __W_EIP__(waiter));
- ret = 0;
- } else {
- /* Add ourselves back to the list */
- TRACE_BUG_ON_LOCKED(!plist_node_empty(&waiter->list));
- plist_node_init(&waiter->list, task->prio);
- task_blocks_on_lock(waiter, ti, lock __W_EIP__(waiter));
- ret = 1;
- }
- } else {
- task->rt_flags &= ~RT_PENDOWNER;
- task->pending_owner = NULL;
- }
+ remove_pending_owner_nolock(owner);

- _raw_spin_unlock(&lock->wait_lock);
- _raw_spin_unlock(&task->pi_lock);
- trace_unlock_irqrestore(&trace_lock, flags, ti);
+ res = 1;

- return ret;
+ out_unlock:
+ _raw_spin_unlock(&owner->pi_lock);
+ return res;
}

static inline void INIT_WAITER(struct rt_mutex_waiter *waiter)
@@ -1366,10 +1209,25 @@ static inline void FREE_WAITER(struct rt
#endif
}

+static int allowed_to_take_lock(struct thread_info *ti,
+ task_t *task,
+ struct thread_info *old_owner,
+ struct rt_mutex *lock)
+{
+ SMP_TRACE_BUG_ON_LOCKED(!spin_is_locked(&lock->wait_lock));
+ SMP_TRACE_BUG_ON_LOCKED(spin_is_locked(&old_owner->task->pi_lock));
+ SMP_TRACE_BUG_ON_LOCKED(spin_is_locked(&task->pi_lock));
+
+ return !old_owner ||
+ (is_kernel_lock(lock) && lock_owner(lock) == ti) ||
+ task_is_pending_owner(ti,lock) ||
+ __grab_lock(lock, task, old_owner->task);
+}
+
/*
* lock it semaphore-style: no worries about missed wakeups.
*/
-static inline void
+static void
____down(struct rt_mutex *lock __EIP_DECL__)
{
struct thread_info *ti = current_thread_info(), *old_owner;
@@ -1379,65 +1237,66 @@ ____down(struct rt_mutex *lock __EIP_DEC

trace_lock_irqsave(&trace_lock, flags, ti);
TRACE_BUG_ON_LOCKED(!raw_irqs_disabled());
- _raw_spin_lock(&task->pi_lock);
_raw_spin_lock(&lock->wait_lock);
INIT_WAITER(&waiter);

- old_owner = lock_owner(lock);
init_lists(lock);

- if (likely(!old_owner) || __grab_lock(lock, task, old_owner->task)) {
+ debug_lock(lock,"down");
+ /* wait to be given the lock */
+ for (;;) {
+ old_owner = lock_owner(lock);
+
+ if(allowed_to_take_lock(ti, task, old_owner,lock)) {
/* granted */
- TRACE_WARN_ON_LOCKED(!plist_head_empty(&lock->wait_list) && !old_owner);
- if (old_owner) {
- _raw_spin_lock(&old_owner->task->pi_lock);
- set_new_owner(lock, old_owner, ti __EIP__);
- _raw_spin_unlock(&old_owner->task->pi_lock);
- } else
+ TRACE_WARN_ON_LOCKED(!plist_head_empty(&lock->wait_list) && !old_owner);
set_new_owner(lock, old_owner, ti __EIP__);
- _raw_spin_unlock(&lock->wait_lock);
- _raw_spin_unlock(&task->pi_lock);
- trace_unlock_irqrestore(&trace_lock, flags, ti);
-
- FREE_WAITER(&waiter);
- return;
- }
-
- set_task_state(task, TASK_UNINTERRUPTIBLE);
+ if (!is_kernel_lock(lock)) {
+ remove_pending_owner(task);
+ }
+ debug_lock(lock,"got lock");

- plist_node_init(&waiter.list, task->prio);
- task_blocks_on_lock(&waiter, ti, lock __EIP__);
+ _raw_spin_unlock(&lock->wait_lock);
+ trace_unlock_irqrestore(&trace_lock, flags, ti);

- TRACE_BUG_ON_LOCKED(!raw_irqs_disabled());
- /* we don't need to touch the lock struct anymore */
- _raw_spin_unlock(&lock->wait_lock);
- _raw_spin_unlock(&task->pi_lock);
- trace_unlock_irqrestore(&trace_lock, flags, ti);
+ FREE_WAITER(&waiter);
+ return;
+ }
+
+ task_blocks_on_lock(&waiter, ti, lock, TASK_UNINTERRUPTIBLE __EIP__);

- might_sleep();
+ TRACE_BUG_ON_LOCKED(!raw_irqs_disabled());
+ /* we don't need to touch the lock struct anymore */
+ debug_lock(lock,"sleeping on");
+ _raw_spin_unlock(&lock->wait_lock);
+ trace_unlock_irqrestore(&trace_lock, flags, ti);
+
+ might_sleep();
+
+ nosched_flag = current->flags & PF_NOSCHED;
+ current->flags &= ~PF_NOSCHED;

- nosched_flag = current->flags & PF_NOSCHED;
- current->flags &= ~PF_NOSCHED;
+ if (waiter.ti)
+ {
+ schedule();
+ }
+
+ current->flags |= nosched_flag;
+ task->state = TASK_RUNNING;

-wait_again:
- /* wait to be given the lock */
- for (;;) {
- if (!waiter.ti)
- break;
- schedule();
- set_task_state(task, TASK_UNINTERRUPTIBLE);
- }
- /*
- * Check to see if we didn't have ownership stolen.
- */
- if (capture_lock(&waiter, ti, task)) {
- set_task_state(task, TASK_UNINTERRUPTIBLE);
- goto wait_again;
+ trace_lock_irqsave(&trace_lock, flags, ti);
+ _raw_spin_lock(&lock->wait_lock);
+ debug_lock(lock,"waking up on");
+ if(waiter.ti) {
+ remove_waiter(lock,&waiter,1);
+ }
+ _raw_spin_lock(&task->pi_lock);
+ task->blocked_on = NULL;
+ _raw_spin_unlock(&task->pi_lock);
}

- current->flags |= nosched_flag;
- task->state = TASK_RUNNING;
- FREE_WAITER(&waiter);
+ /* Should not get here! */
+ BUG_ON(1);
}

/*
@@ -1450,131 +1309,116 @@ wait_again:
* enables the seemless use of arbitrary (blocking) spinlocks within
* sleep/wakeup event loops.
*/
-static inline void
+static void
____down_mutex(struct rt_mutex *lock __EIP_DECL__)
{
struct thread_info *ti = current_thread_info(), *old_owner;
- unsigned long state, saved_state, nosched_flag;
+ unsigned long state, saved_state;
struct task_struct *task = ti->task;
struct rt_mutex_waiter waiter;
unsigned long flags;
- int got_wakeup = 0, saved_lock_depth;
+ int got_wakeup = 0;
+
+

trace_lock_irqsave(&trace_lock, flags, ti);
TRACE_BUG_ON_LOCKED(!raw_irqs_disabled());
- _raw_spin_lock(&task->pi_lock);
_raw_spin_lock(&lock->wait_lock);
- INIT_WAITER(&waiter);
-
- old_owner = lock_owner(lock);
- init_lists(lock);
-
- if (likely(!old_owner) || __grab_lock(lock, task, old_owner->task)) {
- /* granted */
- TRACE_WARN_ON_LOCKED(!plist_head_empty(&lock->wait_list) && !old_owner);
- if (old_owner) {
- _raw_spin_lock(&old_owner->task->pi_lock);
- set_new_owner(lock, old_owner, ti __EIP__);
- _raw_spin_unlock(&old_owner->task->pi_lock);
- } else
- set_new_owner(lock, old_owner, ti __EIP__);
- _raw_spin_unlock(&lock->wait_lock);
- _raw_spin_unlock(&task->pi_lock);
- trace_unlock_irqrestore(&trace_lock, flags, ti);
-
- FREE_WAITER(&waiter);
- return;
- }
-
- plist_node_init(&waiter.list, task->prio);
- task_blocks_on_lock(&waiter, ti, lock __EIP__);
-
- TRACE_BUG_ON_LOCKED(!raw_irqs_disabled());
- /*
+/*
* Here we save whatever state the task was in originally,
* we'll restore it at the end of the function and we'll
* take any intermediate wakeup into account as well,
* independently of the mutex sleep/wakeup mechanism:
*/
saved_state = xchg(&task->state, TASK_UNINTERRUPTIBLE);
+
+ INIT_WAITER(&waiter);

- /* we don't need to touch the lock struct anymore */
- _raw_spin_unlock(&lock->wait_lock);
- _raw_spin_unlock(&task->pi_lock);
- trace_unlock(&trace_lock, ti);
-
- /*
- * TODO: check 'flags' for the IRQ bit here - it is illegal to
- * call down() from an IRQs-off section that results in
- * an actual reschedule.
- */
-
- nosched_flag = current->flags & PF_NOSCHED;
- current->flags &= ~PF_NOSCHED;
-
- /*
- * BKL users expect the BKL to be held across spinlock/rwlock-acquire.
- * Save and clear it, this will cause the scheduler to not drop the
- * BKL semaphore if we end up scheduling:
- */
- saved_lock_depth = task->lock_depth;
- task->lock_depth = -1;
+ init_lists(lock);

-wait_again:
/* wait to be given the lock */
for (;;) {
- unsigned long saved_flags = current->flags & PF_NOSCHED;
-
- if (!waiter.ti)
- break;
- trace_local_irq_enable(ti);
- // no need to check for preemption here, we schedule().
- current->flags &= ~PF_NOSCHED;
+ old_owner = lock_owner(lock);
+
+ if (allowed_to_take_lock(ti,task,old_owner,lock)) {
+ /* granted */
+ TRACE_WARN_ON_LOCKED(!plist_head_empty(&lock->wait_list) && !old_owner);
+ set_new_owner(lock, old_owner, ti __EIP__);
+ remove_pending_owner(task);
+ _raw_spin_unlock(&lock->wait_lock);
+
+ /*
+ * Only set the task's state to TASK_RUNNING if it got
+ * a non-mutex wakeup. We keep the original state otherwise.
+ * A mutex wakeup changes the task's state to TASK_RUNNING_MUTEX,
+ * not TASK_RUNNING - hence we can differenciate betwee5~n the two
+ * cases:
+ */
+ state = xchg(&task->state, saved_state);
+ if (state == TASK_RUNNING)
+ got_wakeup = 1;
+ if (got_wakeup)
+ task->state = TASK_RUNNING;
+ trace_unlock_irqrestore(&trace_lock, flags, ti);
+ preempt_check_resched();

- schedule();
+ FREE_WAITER(&waiter);
+ return;
+ }
+
+ task_blocks_on_lock(&waiter, ti, lock,
+ TASK_UNINTERRUPTIBLE __EIP__);

- current->flags |= saved_flags;
- trace_local_irq_disable(ti);
- state = xchg(&task->state, TASK_UNINTERRUPTIBLE);
- if (state == TASK_RUNNING)
- got_wakeup = 1;
- }
- /*
- * Check to see if we didn't have ownership stolen.
- */
- if (capture_lock(&waiter, ti, task)) {
- state = xchg(&task->state, TASK_UNINTERRUPTIBLE);
- if (state == TASK_RUNNING)
- got_wakeup = 1;
- goto wait_again;
- }
- /*
- * Only set the task's state to TASK_RUNNING if it got
- * a non-mutex wakeup. We keep the original state otherwise.
- * A mutex wakeup changes the task's state to TASK_RUNNING_MUTEX,
- * not TASK_RUNNING - hence we can differenciate between the two
- * cases:
- */
- state = xchg(&task->state, saved_state);
- if (state == TASK_RUNNING)
- got_wakeup = 1;
- if (got_wakeup)
- task->state = TASK_RUNNING;
- trace_local_irq_enable(ti);
- preempt_check_resched();
+ TRACE_BUG_ON_LOCKED(!raw_irqs_disabled());
+ /* we don't need to touch the lock struct anymore */
+ _raw_spin_unlock(&lock->wait_lock);
+ trace_unlock(&trace_lock, ti);
+
+ if (waiter.ti) {
+ unsigned long saved_flags =
+ current->flags & PF_NOSCHED;
+ /*
+ * BKL users expect the BKL to be held across spinlock/rwlock-acquire.
+ * Save and clear it, this will cause the scheduler to not drop the
+ * BKL semaphore if we end up scheduling:
+ */

- task->lock_depth = saved_lock_depth;
- current->flags |= nosched_flag;
- FREE_WAITER(&waiter);
+ int saved_lock_depth = task->lock_depth;
+ task->lock_depth = -1;
+
+
+ trace_local_irq_enable(ti);
+ // no need to check for preemption here, we schedule().
+
+ current->flags &= ~PF_NOSCHED;
+
+ schedule();
+
+ trace_local_irq_disable(ti);
+ task->flags |= saved_flags;
+ task->lock_depth = saved_lock_depth;
+ state = xchg(&task->state, TASK_RUNNING_MUTEX);
+ if (state == TASK_RUNNING)
+ got_wakeup = 1;
+ }
+
+ trace_lock_irq(&trace_lock, ti);
+ _raw_spin_lock(&lock->wait_lock);
+ if(waiter.ti) {
+ remove_waiter(lock,&waiter,1);
+ }
+ _raw_spin_lock(&task->pi_lock);
+ task->blocked_on = NULL;
+ _raw_spin_unlock(&task->pi_lock);
+ }
}

-static void __up_mutex_waiter_savestate(struct rt_mutex *lock __EIP_DECL__);
-static void __up_mutex_waiter_nosavestate(struct rt_mutex *lock __EIP_DECL__);
-
+static void __up_mutex_waiter(struct rt_mutex *lock,
+ int savestate __EIP_DECL__);
/*
* release the lock:
*/
-static inline void
+static void
____up_mutex(struct rt_mutex *lock, int save_state __EIP_DECL__)
{
struct thread_info *ti = current_thread_info();
@@ -1585,30 +1429,31 @@ ____up_mutex(struct rt_mutex *lock, int
trace_lock_irqsave(&trace_lock, flags, ti);
TRACE_BUG_ON_LOCKED(!raw_irqs_disabled());
_raw_spin_lock(&lock->wait_lock);
+ debug_lock(lock,"upping");
TRACE_BUG_ON_LOCKED(!lock->wait_list.prio_list.prev && !lock->wait_list.prio_list.next);

-#ifdef CONFIG_DEBUG_DEADLOCKS
- if (trace_on) {
- TRACE_WARN_ON_LOCKED(lock_owner(lock) != ti);
- TRACE_WARN_ON_LOCKED(list_empty(&lock->held_list));
- list_del_init(&lock->held_list);
- }
-#endif

#if ALL_TASKS_PI
if (plist_head_empty(&lock->wait_list))
check_pi_list_empty(lock, lock_owner(lock));
#endif
if (unlikely(!plist_head_empty(&lock->wait_list))) {
- if (save_state)
- __up_mutex_waiter_savestate(lock __EIP__);
- else
- __up_mutex_waiter_nosavestate(lock __EIP__);
- } else
+ __up_mutex_waiter(lock,save_state __EIP__);
+ debug_lock(lock,"woke up waiter");
+ } else {
+#ifdef CONFIG_DEBUG_DEADLOCKS
+ if (trace_on) {
+ TRACE_WARN_ON_LOCKED(lock_owner(lock) != ti);
+ TRACE_WARN_ON_LOCKED(list_empty(&lock->held_list));
+ list_del_init(&lock->held_list);
+ }
+#endif
lock->owner = NULL;
+ debug_lock(lock,"there was no waiters");
+ account_mutex_owner_up(ti->task);
+ }
_raw_spin_unlock(&lock->wait_lock);
#if defined(CONFIG_DEBUG_PREEMPT) && defined(CONFIG_PREEMPT_RT)
- account_mutex_owner_up(current);
if (!current->lock_count && !rt_prio(current->normal_prio) &&
rt_prio(current->prio)) {
static int once = 1;
@@ -1841,125 +1686,103 @@ static int __sched __down_interruptible(
struct rt_mutex_waiter waiter;
struct timer_list timer;
unsigned long expire = 0;
+ int timer_installed = 0;
int ret;

trace_lock_irqsave(&trace_lock, flags, ti);
TRACE_BUG_ON_LOCKED(!raw_irqs_disabled());
- _raw_spin_lock(&task->pi_lock);
_raw_spin_lock(&lock->wait_lock);
INIT_WAITER(&waiter);

- old_owner = lock_owner(lock);
init_lists(lock);

- if (likely(!old_owner) || __grab_lock(lock, task, old_owner->task)) {
+ ret = 0;
+ /* wait to be given the lock */
+ for (;;) {
+ old_owner = lock_owner(lock);
+
+ if (allowed_to_take_lock(ti,task,old_owner,lock)) {
/* granted */
- TRACE_WARN_ON_LOCKED(!plist_head_empty(&lock->wait_list) && !old_owner);
- if (old_owner) {
- _raw_spin_lock(&old_owner->task->pi_lock);
- set_new_owner(lock, old_owner, ti __EIP__);
- _raw_spin_unlock(&old_owner->task->pi_lock);
- } else
+ TRACE_WARN_ON_LOCKED(!plist_head_empty(&lock->wait_list) && !old_owner);
set_new_owner(lock, old_owner, ti __EIP__);
- _raw_spin_unlock(&lock->wait_lock);
- _raw_spin_unlock(&task->pi_lock);
- trace_unlock_irqrestore(&trace_lock, flags, ti);
-
- FREE_WAITER(&waiter);
- return 0;
- }
+ _raw_spin_unlock(&lock->wait_lock);
+ trace_unlock_irqrestore(&trace_lock, flags, ti);

- set_task_state(task, TASK_INTERRUPTIBLE);
+ goto out_free_timer;
+ }

- plist_node_init(&waiter.list, task->prio);
- task_blocks_on_lock(&waiter, ti, lock __EIP__);
+ task_blocks_on_lock(&waiter, ti, lock, TASK_INTERRUPTIBLE __EIP__);

- TRACE_BUG_ON_LOCKED(!raw_irqs_disabled());
- /* we don't need to touch the lock struct anymore */
- _raw_spin_unlock(&lock->wait_lock);
- _raw_spin_unlock(&task->pi_lock);
- trace_unlock_irqrestore(&trace_lock, flags, ti);
+ TRACE_BUG_ON_LOCKED(!raw_irqs_disabled());
+ /* we don't need to touch the lock struct anymore */
+ _raw_spin_unlock(&lock->wait_lock);
+ trace_unlock_irqrestore(&trace_lock, flags, ti);
+
+ might_sleep();
+
+ nosched_flag = current->flags & PF_NOSCHED;
+ current->flags &= ~PF_NOSCHED;
+ if (time && !timer_installed) {
+ expire = time + jiffies;
+ init_timer(&timer);
+ timer.expires = expire;
+ timer.data = (unsigned long)current;
+ timer.function = process_timeout;
+ add_timer(&timer);
+ timer_installed = 1;
+ }

- might_sleep();
+
+ if (waiter.ti) {
+ schedule();
+ }
+
+ current->flags |= nosched_flag;
+ task->state = TASK_RUNNING;

- nosched_flag = current->flags & PF_NOSCHED;
- current->flags &= ~PF_NOSCHED;
- if (time) {
- expire = time + jiffies;
- init_timer(&timer);
- timer.expires = expire;
- timer.data = (unsigned long)current;
- timer.function = process_timeout;
- add_timer(&timer);
- }
+ trace_lock_irqsave(&trace_lock, flags, ti);
+ _raw_spin_lock(&lock->wait_lock);
+ if(waiter.ti) {
+ remove_waiter(lock,&waiter,1);
+ }
+ _raw_spin_lock(&task->pi_lock);
+ task->blocked_on = NULL;
+ _raw_spin_unlock(&task->pi_lock);

- ret = 0;
-wait_again:
- /* wait to be given the lock */
- for (;;) {
- if (signal_pending(current) || (time && !timer_pending(&timer))) {
- /*
- * Remove ourselves from the wait list if we
- * didnt get the lock - else return success:
- */
- trace_lock_irq(&trace_lock, ti);
- _raw_spin_lock(&task->pi_lock);
- _raw_spin_lock(&lock->wait_lock);
- if (waiter.ti || time) {
- plist_del(&waiter.list);
- /*
- * If we were the last waiter then clear
- * the pending bit:
- */
- if (plist_head_empty(&lock->wait_list))
- lock->owner = lock_owner(lock);
- /*
- * Just remove ourselves from the PI list.
- * (No big problem if our PI effect lingers
- * a bit - owner will restore prio.)
- */
- TRACE_WARN_ON_LOCKED(waiter.ti != ti);
- TRACE_WARN_ON_LOCKED(current->blocked_on != &waiter);
- plist_del(&waiter.pi_list);
- waiter.pi_list.prio = task->prio;
- waiter.ti = NULL;
- current->blocked_on = NULL;
- if (time) {
- ret = (int)(expire - jiffies);
- if (!timer_pending(&timer)) {
- del_singleshot_timer_sync(&timer);
- ret = -ETIMEDOUT;
- }
- } else
- ret = -EINTR;
+ if(signal_pending(current)) {
+ if (time) {
+ ret = (int)(expire - jiffies);
+ if (!timer_pending(&timer)) {
+ ret = -ETIMEDOUT;
+ }
}
- _raw_spin_unlock(&lock->wait_lock);
- _raw_spin_unlock(&task->pi_lock);
- trace_unlock_irq(&trace_lock, ti);
- break;
+ else
+ ret = -EINTR;
+
+ goto out_unlock;
}
- if (!waiter.ti)
- break;
- schedule();
- set_task_state(task, TASK_INTERRUPTIBLE);
- }
-
- /*
- * Check to see if we didn't have ownership stolen.
- */
- if (!ret) {
- if (capture_lock(&waiter, ti, task)) {
- set_task_state(task, TASK_INTERRUPTIBLE);
- goto wait_again;
+ else if(timer_installed &&
+ !timer_pending(&timer)) {
+ ret = -ETIMEDOUT;
+ goto out_unlock;
}
}

- task->state = TASK_RUNNING;
- current->flags |= nosched_flag;

+ out_unlock:
+ _raw_spin_unlock(&lock->wait_lock);
+ trace_unlock_irqrestore(&trace_lock, flags, ti);
+
+ out_free_timer:
+ if (time && timer_installed) {
+ if (!timer_pending(&timer)) {
+ del_singleshot_timer_sync(&timer);
+ }
+ }
FREE_WAITER(&waiter);
return ret;
}
+
/*
* trylock for writing -- returns 1 if successful, 0 if contention
*/
@@ -1972,7 +1795,6 @@ static int __down_trylock(struct rt_mute

trace_lock_irqsave(&trace_lock, flags, ti);
TRACE_BUG_ON_LOCKED(!raw_irqs_disabled());
- _raw_spin_lock(&task->pi_lock);
/*
* It is OK for the owner of the lock to do a trylock on
* a lock it owns, so to prevent deadlocking, we must
@@ -1989,17 +1811,11 @@ static int __down_trylock(struct rt_mute
if (likely(!old_owner) || __grab_lock(lock, task, old_owner->task)) {
/* granted */
TRACE_WARN_ON_LOCKED(!plist_head_empty(&lock->wait_list) && !old_owner);
- if (old_owner) {
- _raw_spin_lock(&old_owner->task->pi_lock);
- set_new_owner(lock, old_owner, ti __EIP__);
- _raw_spin_unlock(&old_owner->task->pi_lock);
- } else
- set_new_owner(lock, old_owner, ti __EIP__);
+ set_new_owner(lock, old_owner, ti __EIP__);
ret = 1;
}
_raw_spin_unlock(&lock->wait_lock);
failed:
- _raw_spin_unlock(&task->pi_lock);
trace_unlock_irqrestore(&trace_lock, flags, ti);

return ret;
@@ -2046,16 +1862,16 @@ static int down_read_trylock_mutex(struc
}
#endif

-static void __up_mutex_waiter_nosavestate(struct rt_mutex *lock __EIP_DECL__)
+static void __up_mutex_waiter(struct rt_mutex *lock,
+ int save_state __EIP_DECL__)
{
struct thread_info *old_owner_ti, *new_owner_ti;
struct task_struct *old_owner, *new_owner;
- struct rt_mutex_waiter *w;
int prio;

old_owner_ti = lock_owner(lock);
old_owner = old_owner_ti->task;
- new_owner_ti = pick_new_owner(lock, old_owner_ti, 0 __EIP__);
+ new_owner_ti = pick_new_owner(lock, old_owner_ti, save_state __EIP__);
new_owner = new_owner_ti->task;

/*
@@ -2063,67 +1879,21 @@ static void __up_mutex_waiter_nosavestat
* to the previous priority (or to the next highest prio
* waiter's priority):
*/
- _raw_spin_lock(&old_owner->pi_lock);
- prio = old_owner->normal_prio;
- if (unlikely(!plist_head_empty(&old_owner->pi_waiters))) {
- w = plist_first_entry(&old_owner->pi_waiters, struct rt_mutex_waiter, pi_list);
- if (w->ti->task->prio < prio)
- prio = w->ti->task->prio;
- }
- if (unlikely(prio != old_owner->prio))
- pi_setprio(lock, old_owner, prio);
- _raw_spin_unlock(&old_owner->pi_lock);
-#ifdef CAPTURE_LOCK
-#ifdef CONFIG_PREEMPT_RT
- if (lock != &kernel_sem.lock) {
-#endif
- new_owner->rt_flags |= RT_PENDOWNER;
- new_owner->pending_owner = lock;
-#ifdef CONFIG_PREEMPT_RT
- }
-#endif
-#endif
- wake_up_process(new_owner);
-}
-
-static void __up_mutex_waiter_savestate(struct rt_mutex *lock __EIP_DECL__)
-{
- struct thread_info *old_owner_ti, *new_owner_ti;
- struct task_struct *old_owner, *new_owner;
- struct rt_mutex_waiter *w;
- int prio;
+ if(ALL_TASKS_PI || rt_prio(old_owner->prio)) {
+ _raw_spin_lock(&old_owner->pi_lock);

- old_owner_ti = lock_owner(lock);
- old_owner = old_owner_ti->task;
- new_owner_ti = pick_new_owner(lock, old_owner_ti, 1 __EIP__);
- new_owner = new_owner_ti->task;
+ prio = calc_pi_prio(old_owner);
+ if (unlikely(prio != old_owner->prio))
+ mutex_setprio(old_owner, prio);

- /*
- * If the owner got priority-boosted then restore it
- * to the previous priority (or to the next highest prio
- * waiter's priority):
- */
- _raw_spin_lock(&old_owner->pi_lock);
- prio = old_owner->normal_prio;
- if (unlikely(!plist_head_empty(&old_owner->pi_waiters))) {
- w = plist_first_entry(&old_owner->pi_waiters, struct rt_mutex_waiter, pi_list);
- if (w->ti->task->prio < prio)
- prio = w->ti->task->prio;
- }
- if (unlikely(prio != old_owner->prio))
- pi_setprio(lock, old_owner, prio);
- _raw_spin_unlock(&old_owner->pi_lock);
-#ifdef CAPTURE_LOCK
-#ifdef CONFIG_PREEMPT_RT
- if (lock != &kernel_sem.lock) {
-#endif
- new_owner->rt_flags |= RT_PENDOWNER;
- new_owner->pending_owner = lock;
-#ifdef CONFIG_PREEMPT_RT
+ _raw_spin_unlock(&old_owner->pi_lock);
+ }
+ if(save_state) {
+ wake_up_process_mutex(new_owner);
+ }
+ else {
+ wake_up_process(new_owner);
}
-#endif
-#endif
- wake_up_process_mutex(new_owner);
}

#ifdef CONFIG_PREEMPT_RT
@@ -2578,7 +2348,7 @@ int __lockfunc _read_trylock(rwlock_t *r
{
#ifdef CONFIG_DEBUG_RT_LOCKING_MODE
if (!preempt_locks)
- return _raw_read_trylock(&rwlock->lock.lock.debug_rwlock);
+ return _raw_read_trylock(&rwlock->lock.lock.debug_rwlock);
else
#endif
return down_read_trylock_mutex(&rwlock->lock);
@@ -2905,17 +2675,6 @@ notrace int irqs_disabled(void)
EXPORT_SYMBOL(irqs_disabled);
#endif

-/*
- * This routine changes the owner of a mutex. It's only
- * caller is the futex code which locks a futex on behalf
- * of another thread.
- */
-void fastcall rt_mutex_set_owner(struct rt_mutex *lock, struct thread_info *t)
-{
- account_mutex_owner_up(current);
- account_mutex_owner_down(t->task, lock);
- lock->owner = t;
-}

struct thread_info * fastcall rt_mutex_owner(struct rt_mutex *lock)
{
@@ -2950,7 +2709,6 @@ down_try_futex(struct rt_mutex *lock, st

trace_lock_irqsave(&trace_lock, flags, proxy_owner);
TRACE_BUG_ON_LOCKED(!raw_irqs_disabled());
- _raw_spin_lock(&task->pi_lock);
_raw_spin_lock(&lock->wait_lock);

old_owner = lock_owner(lock);
@@ -2959,16 +2717,10 @@ down_try_futex(struct rt_mutex *lock, st
if (likely(!old_owner) || __grab_lock(lock, task, old_owner->task)) {
/* granted */
TRACE_WARN_ON_LOCKED(!plist_head_empty(&lock->wait_list) && !old_owner);
- if (old_owner) {
- _raw_spin_lock(&old_owner->task->pi_lock);
- set_new_owner(lock, old_owner, proxy_owner __EIP__);
- _raw_spin_unlock(&old_owner->task->pi_lock);
- } else
set_new_owner(lock, old_owner, proxy_owner __EIP__);
ret = 1;
}
_raw_spin_unlock(&lock->wait_lock);
- _raw_spin_unlock(&task->pi_lock);
trace_unlock_irqrestore(&trace_lock, flags, proxy_owner);

return ret;
@@ -3064,3 +2816,33 @@ void fastcall init_rt_mutex(struct rt_mu
__init_rt_mutex(lock, save_state, name, file, line);
}
EXPORT_SYMBOL(init_rt_mutex);
+
+
+pid_t get_blocked_on(task_t *task)
+{
+ pid_t res = 0;
+ struct rt_mutex *lock;
+ struct thread_info *owner;
+ try_again:
+ _raw_spin_lock(&task->pi_lock);
+ if(!task->blocked_on) {
+ _raw_spin_unlock(&task->pi_lock);
+ goto out;
+ }
+ lock = task->blocked_on->lock;
+ if(!_raw_spin_trylock(&lock->wait_lock)) {
+ _raw_spin_unlock(&task->pi_lock);
+ goto try_again;
+ }
+ owner = lock_owner(lock);
+ if(owner)
+ res = owner->task->pid;
+
+ _raw_spin_unlock(&task->pi_lock);
+ _raw_spin_unlock(&lock->wait_lock);
+
+ out:
+ return res;
+
+}
+EXPORT_SYMBOL(get_blocked_on);