Re: [PATCH] [patch 4a/4] ipc: sem optimise simple operations

From: Manfred Spraul
Date: Sat Aug 15 2009 - 06:09:24 EST


This is a multi-part message in MIME format.On 08/15/2009 06:52 AM, Nick Piggin wrote:

The problem with using the same algorithm is that we don't need to
"restart scanning" the simple-op lists when one of them succeeds.
This is because if a negative op fails, then no amount of subsequent
simple negative ops will make it succeed.

With multi-ops, it can be adding and subtracting and waiting for
zero of different sems etc, so in order to try to stay strictly
FIFO, then we always have to recheck all previous failed ops after
one succeeds.
What prevents us from optimizing the complex algorithm?

Right now, the rule is "restart if q->alter".
What about "restart if at least one operation was an increment or if at least one
semaphore got 0"?

The other problem is that we have to scan all ops in the multi-op
list because we don't know what combination of sem values might
allow the op to succeed. With the single-op lists, we know if the
semval is 0, then we don't have to scan the negative list, and if
it is non-0 then we don't have to scan the zero list.
If semval is 0, then the negative list is empty - it doesn't cost anything to scan it, especially if the negative list is the first part of a joint list.

But you describe one difference between the simple and complex operations:
- complex "increment" operations can be in the queue.
- simple "increment" operations always succeed immediately.

Thus it is possible to stop scanning the simple queue if an "alter" operation is found and the semaphore value is 0.
For the complex queue, this optimization doesn't appear to be that simple.

AFAICS this is a realistic case:
- 1000 threads are waiting with "decrement by 1".
- semval is 0.
- an increment by 1 arrives.

I've optimized that case, too - it's just one test.

With both optimizations, I see only one case left where your algorithm is better:
- semaphore value 1.
- 1000 threads waiting for zero.
- an increment by 1 arrives.

My code would scan the 1000 entries, your code doesn't.
But: Does that exist in real life?

Btw,
- semaphore value 1
- 1000 threads with "decrement 2" are waiting
- an decrement by 1 arrives.
My code doesn't scan at all, AFAICS you would scan all 1000 entries.
The same argument applies:
I would bet that this case doesn't exist in real life.

Could you create a dump of the pending list from a test run?
Does your database use "wait for zero", or increment/decrement operations with offsets that are not +-1?
Do you see a common case where separate algorithms are necessary?

Combine these two problems and that is where the O(n^2) behaviour
comes in to the complex-op algorithm.
Yes - the worst case remains O(n^2), but only if there are increments.
(to be fair: with the optimizations O(n*m) with n waiting tasks and m increments)

But OTHO neither of our patches solves that.

--
Manfred