Re: RFC: Ideal Adaptive Spinning Conditions

From: Darren Hart
Date: Wed Mar 31 2010 - 22:25:48 EST


Peter W. Morreale wrote:
On Wed, 2010-03-31 at 19:38 -0400, Steven Rostedt wrote:
On Wed, 2010-03-31 at 16:21 -0700, Darren Hart wrote:

o What type of lock hold times do we expect to benefit?
0 (that's a zero) :-p

I haven't seen your patches but you are not doing a heuristic approach,
are you? That is, do not "spin" hoping the lock will suddenly become
free. I was against that for -rt and I would be against that for futex
too.

o How much contention is a good match for adaptive spinning?
- this is related to the number of threads to run in the test
o How many spinners should be allowed?

I can share the kernel patches if people are interested, but they are really early, and I'm not sure they are of much value until I better understand the conditions where this is expected to be useful.
Again, I don't know how you implemented your adaptive spinners, but the
trick to it in -rt was that it would only spin while the owner of the
lock was actually running. If it was not running, it would sleep. No
point waiting for a sleeping task to release its lock.

Right. This was *critical* for the adaptive rtmutex. Note in the RT
patch, everybody spins as long as the current owner is on CPU.

Everybody spins? Really? For RT Tasks I suppose that makes sense as they will sort out the priority for themselves and if they preempt the owner they will all immediately schedule out and boost the priority of the owner.... but then we lose the benefit of spinning since we just put everyone to sleep. I'll have to take a look at that and see what I'm missing.


FWIW, IIRC, Solaris has a heuristic approach where incoming tasks spin
for a period of time before going to sleep. (Cray UINCOS did the same)

I suppose a heuristic approach could still be used so long as continued spinning was conditional on the owner continuing to run on a CPU.


Is this what you did? Because, IIRC, this only benefited spinlocks
converted to mutexes. It did not help with semaphores, because
semaphores could be held for a long time. Thus, it was good for short
held locks, but hurt performance on long held locks.


nod. The entire premise was based on the fact that we were converting
spinlocks, which by definition were short held locks. What I found
during early development was that the sleep/wakeup cycle was more
intrusive for RT than spinning.

Right, and I'm looking to provide some kernel assistance for userspace spinlocks here, and am targeting short lived critical sections as well.


IIRC, I measured something like 380k context switches/second prior to
the adaptive patches for a dbench test and we cut this down to somewhere
around 50k, with a corresponding increase in throughput. (I can't
remember specific numbers any more, it was a while ago... ;-)

When applied to semaphores, the benefit was in the noise range as I
recall..

(dbench was chosen due to the heavy contention on the dcache spinlock)

Interesting, thanks for the input.

--
Darren



Best,
-PWM


If userspace is going to do this, I guess the blocked task would need to
go into kernel, and spin there (with preempt enabled) if the task is
still active and holding the lock.

Then the application would need to determine which to use. An adaptive
spinner for short held locks, and a normal futex for long held locks.

-- Steve






--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/