Re: [PATCH 1/2] ipc semaphores: reduce ipc_lock contention in semtimedop

From: Manfred Spraul
Date: Wed Apr 14 2010 - 12:16:54 EST


On 04/13/2010 08:19 PM, Chris Mason wrote:
On Wed, Apr 14, 2010 at 04:09:45AM +1000, Nick Piggin wrote:
On Tue, Apr 13, 2010 at 01:39:41PM -0400, Chris Mason wrote:
The other thing I don't know if your patch gets right is requeueing on
of the operations. When you requeue from one list to another, then you
seem to lose ordering with other pending operations, so that would
seem to break the API as well (can't remember if the API strictly
mandates FIFO, but anyway it can open up starvation cases).
I don't see anything in the docs about the FIFO order. I could add an
extra sort on sequence number pretty easily, but is the starvation case
really that bad?

How do you want to determine the sequence number?
Is atomic_inc_return() on a per-semaphore array counter sufficiently fast?

I was looking at doing a sequence number to be able to sort these, but
it ended up getting over complex (and SAP was only using simple ops so
it didn't seem to need much better).

We want to be careful not to change semantics at all. And it gets
tricky quickly :( What about Zach's simpler wakeup API?
Yeah, that's why my patches include code to handle userland sending
duplicate semids. Zach's simpler API is cooking too, but if I can get
this done without insane complexity it helps with more than just the
post/wait oracle workload.

What is the oracle workload, which multi-sembuf operations does it use?
How many semaphores are in one array?

When the last optimizations were written, I've searched a bit:
- postgres uses per-process semaphores, with small semaphore arrays.
[process sleeps on it's own semaphore and is woken up by someone else when it can make progress]
- with google, I couldn't find anything relevant that uses multi-sembuf semop() calls.

And I agree with Nick: We should be careful about changing the API.

--
Manfred
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/