Re: [PATCH -rt] ipc/sem: Rework semaphore wakeups

From: Manfred Spraul
Date: Thu Sep 15 2011 - 12:58:25 EST


On 09/14/2011 09:23 PM, Peter Zijlstra wrote:
On Wed, 2011-09-14 at 20:48 +0200, Manfred Spraul wrote:
The code does:

spin_lock()
preempt_disable();
usually_very_simple_but_worstcase_O_2
spin_unlock()
usually_very_simple_but_worstcase_O_1
preempt_enable();

with your change, it becomes:

spin_lock()
usually_very_simple_but_worstcase_O_2
usually_very_simple_but_worstcase_O_1
spin_unlock()

The complex ops remain unchanged, they are still under a lock.
preemptible lock (aka pi-mutex) on -rt, so no weird latencies.
But the change means that more operations are under spin_lock().
Acutally for a large SMP system with a simple semaphore operation, the wake_up_process() takes longer than the semaphore operation.
And for some databases, contention on the spin_lock() is an issue.


What about removing the preempt_disable?
It's only there to cover a rare race on uniprocessor preempt systems.
(a task is woken up simultaneously due to timeout of semtimedop() and a
true wakeup)

Then fix the that race - something like the attached patch [obviously
buggy - see the fixme]
sched_yield() is always a bug, as is it here. Its an life-lock if the
woken task is of higher priority than the waking task. A higher prio
FIFO task calling sched_yield() in a loop is just that, a loop, starving
the lower prio waker.

If you've got enough medium prio tasks around to occupy all other cpus,
you're got indefinite priority inversion, so even on smp its a problem.

But yeah its not the prettiest of solutions but it works.. see that
other patch with the wake-list stuff for something that ought to work
for both rt and mainline (except of course it doesn't actually work).
Wake lists are definitively the better approach.
[let's continue in that thread]

--
Manfred
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/