Re: [PATCH 4/4] futex: convert hash_bucket locks to raw_spinlock_t

From: Darren Hart
Date: Sun Jul 11 2010 - 11:10:25 EST


On 07/11/2010 06:33 AM, Mike Galbraith wrote:
On Sat, 2010-07-10 at 21:41 +0200, Mike Galbraith wrote:
On Fri, 2010-07-09 at 15:33 -0700, Darren Hart wrote:

If we can't move the unlock above before set_owner, then we may need a:

retry:
cur->lock()
top_waiter = get_top_waiter()
cur->unlock()

double_lock(cur, topwaiter)
if top_waiter != get_top_waiter()
double_unlock(cur, topwaiter)
goto retry

Not ideal, but I think I prefer that to making all the hb locks raw.

Another option: only scratch the itchy spot.

futex: non-blocking synchronization point for futex_wait_requeue_pi() and futex_requeue().

Problem analysis by Darren Hart;
The requeue_pi mechanism introduced proxy locking of the rtmutex. This creates
a scenario where a task can wake-up, not knowing it has been enqueued on an
rtmutex. In order to detect this, the task would have to be able to take either
task->pi_blocked_on->lock->wait_lock and/or the hb->lock. Unfortunately,
without already holding one of these, the pi_blocked_on variable can change
from NULL to valid or from valid to NULL. Therefor, the task cannot be allowed
to take a sleeping lock after wakeup or it could end up trying to block on two
locks, the second overwriting a valid pi_blocked_on value. This obviously
breaks the pi mechanism.

Rather than convert the bh-lock to a raw spinlock, do so only in the spot where
blocking cannot be allowed, ie before we know that lock handoff has completed.

I like it. I especially like the change is only evident if you are using the code path that introduced the problem in the first place. If you're doing a lot of requeue_pi operations, then the waking waiters have an advantage over new pending waiters or other tasks with futex keyed on the same hash-bucket... but that seems acceptable to me.

I'd like to confirm that holding the pendowner->pi-lock across the wakeup in wakeup_next_waiter() isn't feasible first. If it can work, I think the impact would be lower. I'll have a look tomorrow.

Nice work Mike.

--
Darrem

Signed-off-by: Mike Galbraith<efault@xxxxxx>
Cc: Darren Hart<dvhltc@xxxxxxxxxx>
Cc: Thomas Gleixner<tglx@xxxxxxxxxxxxx>
Cc: Peter Zijlstra<peterz@xxxxxxxxxxxxx>
Cc: Ingo Molnar<mingo@xxxxxxx>
Cc: Eric Dumazet<eric.dumazet@xxxxxxxxx>
Cc: John Kacur<jkacur@xxxxxxxxxx>
Cc: Steven Rostedt<rostedt@xxxxxxxxxxx>

diff --git a/kernel/futex.c b/kernel/futex.c
index a6cec32..ef489f3 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2255,7 +2255,14 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared,
/* Queue the futex_q, drop the hb lock, wait for wakeup. */
futex_wait_queue_me(hb,&q, to);

- spin_lock(&hb->lock);
+ /*
+ * Non-blocking synchronization point with futex_requeue().
+ *
+ * We dare not block here because this will alter PI state, possibly
+ * before our waker finishes modifying same in wakeup_next_waiter().
+ */
+ while(!spin_trylock(&hb->lock))
+ cpu_relax();
ret = handle_early_requeue_pi_wakeup(hb,&q,&key2, to);
spin_unlock(&hb->lock);
if (ret)




--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/