Re: [rfc][patch] queued spinlocks (i386)

From: Ravikiran G Thirumalai
Date: Fri Mar 23 2007 - 15:27:42 EST


On Fri, Mar 23, 2007 at 10:40:17AM +0100, Eric Dumazet wrote:
> On Fri, 23 Mar 2007 09:59:11 +0100
> Nick Piggin <npiggin@xxxxxxx> wrote:
>
> >
> > Implement queued spinlocks for i386. This shouldn't increase the size of
> > the spinlock structure, while still able to handle 2^16 CPUs.
> >
> > Not completely implemented with assembly yet, to make the algorithm a bit
> > clearer.
> >
> > The queued spinlock has 2 fields, a head and a tail, which are indexes
> > into a FIFO of waiting CPUs. To take a spinlock, a CPU performs an
> > "atomic_inc_return" on the head index, and keeps the returned value as
> > a ticket. The CPU then spins until the tail index is equal to that
> > ticket.
> >
> > To unlock a spinlock, the tail index is incremented (this can be non
> > atomic, because only the lock owner will modify tail).
> >
> > Implementation inefficiencies aside, this change should have little
> > effect on performance for uncontended locks, but will have quite a
> > large cost for highly contended locks [O(N) cacheline transfers vs
> > O(1) per lock aquisition, where N is the number of CPUs contending].
> > The benefit is is that contended locks will not cause any starvation.
> >
> > Just an idea. Big NUMA hardware seems to have fairness logic that
> > prevents starvation for the regular spinlock logic. But it might be
> > interesting for -rt kernel or systems with starvation issues.
>
> It's a very nice idea Nick.

Amen to that.

>
> You also have for free the number or cpus that are before you.
>
> On big SMP/NUMA, we could use this information to call a special lock_cpu_relax() function to avoid cacheline transferts.
>
> asm volatile(LOCK_PREFIX "xaddw %0, %1\n\t"
> : "+r" (pos), "+m" (lock->qhead) : : "memory");
> for (;;) {
> unsigned short nwait = pos - lock->qtail;
> if (likely(nwait == 0))
> break;
> lock_cpu_relax(lock, nwait);
> }
>
> lock_cpu_relax(raw_spinlock_t *lock, unsigned int nwait)
> {
> unsigned int cycles = nwait * lock->min_cycles_per_round;
> busy_loop(cycles);
> }

Good Idea. Hopefully, this should reduce the number of cacheline transfers
in the contended case.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/