Re: [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptivespinning

From: Avi Kivity
Date: Mon Apr 05 2010 - 18:22:43 EST


On 04/06/2010 12:54 AM, Darren Hart wrote:
Avi Kivity wrote:
On 04/05/2010 11:23 PM, Darren Hart wrote:
In-Reply-To:

NOT FOR INCLUSION

The following patch series implements a new experimental kernel side futex mutex
via new FUTEX_LOCK and FUTEX_LOCK_ADAPTIVE futex op codes. The adaptive spin
follows the kernel mutex model of allowing one spinner until the lock is
released or the owner is descheduled. The patch currently allows the user to
specify if they want no spinning, a single adaptive spinner, or multiple
spinners (aggressive adaptive spinning, or aas... which I have mistyped as "ass"
enough times to realize a better term is indeed required :-).

An interesting (but perhaps difficult to achieve) optimization would be to spin in userspace.

I couldn't think of a lightweight way to determine when the owner has been scheduled out in userspace. Kernel assistance is required. You could do this on the schedule() side of things, but I figured I'd get some strong pushback if I tried to add a hook into descheduling that flipped a bit in the futex value stating the owner was about to deschedule(). Still, that might be something to explore.

In the futex value it's hopeless (since a thread can hold many locks), but I don't think it's unreasonable to set a bit in the thread local storage area. The futex format would then need to be extended to contain a pointer to this bit.




How many cores (or hardware threads) does this machine have?

Sorry, I meant to include that. I tested on an 8 CPU (no hardware threads) 2.6 GHz Opteron 2218 (2 QuadCore CPUs) system.

> At 10%
duty cycle you have 25 waiters behind the lock on average. I don't think this is realistic, and it means that spinning is invoked only rarely.

Perhaps some instrumentation is in order, it seems to get invoked enough to achieve some 20% increase in lock/unlock iterations. Perhaps another metric would be of more value - such as average wait time?

Why measure an unrealistic workload?


I'd be interested in seeing runs where the average number of waiters is 0.2, 0.5, 1, and 2, corresponding to moderate-to-bad contention.
25 average waiters on compute bound code means the application needs to be rewritten, no amount of mutex tweaking will help it.

Perhaps something NR_CPUS threads would be of more interest?

That seems artificial.

At 10% that's about .8 and at 25% the 2 of your upper limit. I could add a few more duty-cycle points and make 25% the max. I'll kick that off and post the results... probably tomorrow, 10M iterations takes a while, but makes the results relatively stable.

Thanks. But why not vary the number of threads as well?


Does the wakeup code select the spinning waiter, or just a random waiter?

The wakeup code selects the highest priority task in fifo order to wake-up - however, under contention it is most likely going to go back to sleep as another waiter will steal the lock out from under it. This locking strategy is unashamedly about as "unfair" as it gets.

Best to avoid the wakeup if we notice the lock was stolen.


--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/