Re: [PATCH -rt] ipc/sem: Rework semaphore wakeups

From: Manfred Spraul
Date: Wed Sep 14 2011 - 14:42:26 EST


On 09/14/2011 11:57 AM, Peter Zijlstra wrote:
Subject: ipc/sem: Rework semaphore wakeups
From: Peter Zijlstra<a.p.zijlstra@xxxxxxxxx>
Date: Tue Sep 13 15:09:40 CEST 2011

Current sysv sems have a weird ass wakeup scheme that involves keeping
preemption disabled over a potential O(n^2) loop and busy waiting on
that on other CPUs.
Have you checked that the patch improves the latency?
Note that the busy wait only happens if there is a simultaneous timeout of a semtimedop() and a true wakeup.

The code does:

spin_lock()
preempt_disable();
usually_very_simple_but_worstcase_O_2
spin_unlock()
usually_very_simple_but_worstcase_O_1
preempt_enable();

with your change, it becomes:

spin_lock()
usually_very_simple_but_worstcase_O_2
usually_very_simple_but_worstcase_O_1
spin_unlock()

The complex ops remain unchanged, they are still under a lock.

What about removing the preempt_disable?
It's only there to cover a rare race on uniprocessor preempt systems.
(a task is woken up simultaneously due to timeout of semtimedop() and a true wakeup)

Then fix the that race - something like the attached patch [obviously buggy - see the fixme]

--
Manfred
diff --git a/ipc/sem.c b/ipc/sem.c
index add93d2..96aef6d 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -416,11 +416,13 @@ static void wake_up_sem_queue_prepare(struct list_head *pt,
struct sem_queue *q, int error)
{
if (list_empty(pt)) {
+#ifndef CONFIG_PREEMPT_RT_BASE
/*
* Hold preempt off so that we don't get preempted and have the
* wakee busy-wait until we're scheduled back on.
*/
preempt_disable();
+#endif
}
q->status = IN_WAKEUP;
q->pid = error;
@@ -449,8 +451,10 @@ static void wake_up_sem_queue_do(struct list_head *pt)
smp_wmb();
q->status = q->pid;
}
- if (did_something)
+ if (did_something) {
+#ifndef CONFIG_PREEMPT_RT_BASE
preempt_enable();
+#endif
}

static void unlink_queue(struct sem_array *sma, struct sem_queue *q)
@@ -1280,6 +1284,13 @@ static int get_queue_result(struct sem_queue *q)
error = q->status;
while (unlikely(error == IN_WAKEUP)) {
cpu_relax();
+#ifdef CONFIG_PREEMPT_RT_BASE
+ /*FIXME: obviously broken if called with semaphore spinlock held
+ * sched_yield() should only be called if get_queue_result() is
+ * called outside of the semaphore lock
+ */
+ sched_yield();
+#endif
error = q->status;
}