Re: [PATCH RFC ticketlock] v3 Auto-queued ticketlock

From: Paul E. McKenney
Date: Thu Jun 13 2013 - 11:23:07 EST

Next message: Heiko Stübner: "Re: [PATCH 1/2] pinctrl: add function to parse generic pinconfig properties from a dt node"
Previous message: J. Bruce Fields: "Re: [PATCH v2 12/14] locks: give the blocked_hash its own spinlock"
In reply to: Lai Jiangshan: "Re: [PATCH RFC ticketlock] v3 Auto-queued ticketlock"
Next in thread: Lai Jiangshan: "Re: [PATCH RFC ticketlock] v3 Auto-queued ticketlock"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, Jun 13, 2013 at 10:55:41AM +0800, Lai Jiangshan wrote:
> On 06/12/2013 11:40 PM, Paul E. McKenney wrote:
> > Breaking up locks is better than implementing high-contention locks, but
> > if we must have high-contention locks, why not make them automatically
> > switch between light-weight ticket locks at low contention and queued
> > locks at high contention? After all, this would remove the need for
> > the developer to predict which locks will be highly contended.
> >
> > This commit allows ticket locks to automatically switch between pure
> > ticketlock and queued-lock operation as needed. If too many CPUs are
> > spinning on a given ticket lock, a queue structure will be allocated
> > and the lock will switch to queued-lock operation. When the lock becomes
> > free, it will switch back into ticketlock operation. The low-order bit
> > of the head counter is used to indicate that the lock is in queued mode,
> > which forces an unconditional mismatch between the head and tail counters.
> > This approach means that the common-case code path under conditions of
> > low contention is very nearly that of a plain ticket lock.
> >
> > A fixed number of queueing structures is statically allocated in an
> > array. The ticket-lock address is used to hash into an initial element,
> > but if that element is already in use, it moves to the next element. If
> > the entire array is already in use, continue to spin in ticket mode.
> >
> > Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
> > [ paulmck: Eliminate duplicate code and update comments (Steven Rostedt). ]
> > [ paulmck: Address Eric Dumazet review feedback. ]
> > [ paulmck: Use Lai Jiangshan idea to eliminate smp_mb(). ]
> > [ paulmck: Expand ->head_tkt from s32 to s64 (Waiman Long). ]
> > [ paulmck: Move cpu_relax() to main spin loop (Steven Rostedt). ]
> > [ paulmck: Reduce queue-switch contention (Waiman Long). ]
> > [ paulmck: __TKT_SPIN_INC for __ticket_spin_trylock() (Steffen Persvold). ]
> > [ paulmck: Type safety fixes (Steven Rostedt). ]
> > [ paulmck: Pre-check cmpxchg() value (Waiman Long). ]
> > [ paulmck: smp_mb() downgrade to smp_wmb() (Lai Jiangshan). ]
>
>
> Hi, Paul,
>
> I simplify the code and remove the search by encoding the index of struct tkt_q_head
> into lock->tickets.head.
>
> A) lock->tickets.head(when queued-lock):
> ---------------------------------
> index of struct tkt_q_head |0|1|
> ---------------------------------

Interesting approach! It might reduce queued-mode overhead a bit in
some cases, though I bet that in the common case the first queue element
examined is the right one. More on this below.

> The bit0 = 1 for queued, and the bit1 = 0 is reserved for __ticket_spin_unlock(),
> thus __ticket_spin_unlock() will not change the higher bits of lock->tickets.head.
>
> B) tqhp->head is for the real value of lock->tickets.head.
> if the last bit of tqhp->head is 1, it means it is the head ticket when started queuing.

But don't you also need the xadd() in __ticket_spin_unlock() to become
a cmpxchg() for this to work? Or is your patch missing your changes to
arch/x86/include/asm/spinlock.h? Either way, this is likely to increase
the no-contention overhead, which might be counterproductive. Wouldn't
hurt to get measurements, though.

Given the results that Davidlohr posted, I believe that the following
optimizations would also provide some improvement:

1. Move the call to tkt_spin_pass() from __ticket_spin_lock()
to a separate linker section in order to reduce the icache
penalty exacted by the spinloop. This is likely to be causing
some of the performance reductions in the cases where ticket
locks are not highly contended.

2. Limit the number of elements searched for in the array of
queues. However, this would help only if a number of ticket
locks were in queued mode at the same time.

3. Dynamically allocate the queue array at boot. This might
also reduce cache pressure, again, at least in cases where
there are a number of ticket locks in queued mode at the
same time.

Frederic just reminded me that I owe him some energy-efficiency improvements
for adaptive ticks, so I won't get to these very quickly. Please feel free
to take these on -- the patch clearly does well under high contention, so
reducing the no-contention penalty could really help.

Thanx, Paul

> Thanks,
> Lai
>
> kernel/tktqlock.c | 51 +++++++++++++--------------------------------------
> 1 files changed, 13 insertions(+), 38 deletions(-)
>
> diff --git a/kernel/tktqlock.c b/kernel/tktqlock.c
> index 912817c..1329d0f 100644
> --- a/kernel/tktqlock.c
> +++ b/kernel/tktqlock.c
> @@ -33,7 +33,7 @@ struct tkt_q {
>
> struct tkt_q_head {
> arch_spinlock_t *ref; /* Pointer to spinlock if in use. */
> - s64 head_tkt; /* Head ticket when started queuing. */
> + __ticket_t head; /* Real head when queued. */
> struct tkt_q *spin; /* Head of queue. */
> struct tkt_q **spin_tail; /* Tail of queue. */
> };
> @@ -77,15 +77,8 @@ static unsigned long tkt_q_hash(arch_spinlock_t *lock)
> */
> static struct tkt_q_head *tkt_q_find_head(arch_spinlock_t *lock)
> {
> - int i;
> - int start;
> -
> - start = i = tkt_q_hash(lock);
> - do
> - if (ACCESS_ONCE(tkt_q_heads[i].ref) == lock)
> - return &tkt_q_heads[i];
> - while ((i = tkt_q_next_slot(i)) != start);
> - return NULL;
> + BUILD_BUG_ON(TKT_Q_NQUEUES > (1 << (TICKET_SHIFT - 2)));
> + return &tkt_q_heads[ACCESS_ONCE(lock->tickets.head) >> 2];
> }
>
> /*
> @@ -101,11 +94,11 @@ static bool tkt_q_try_unqueue(arch_spinlock_t *lock, struct tkt_q_head *tqhp)
>
> /* Pick up the ticket values. */
> asold = ACCESS_ONCE(*lock);
> - if ((asold.tickets.head & ~0x1) == asold.tickets.tail) {
> + if (tqhp->head == asold.tickets.tail) {
>
> /* Attempt to mark the lock as not having a queue. */
> asnew = asold;
> - asnew.tickets.head &= ~0x1;
> + asnew.tickets.head = tqhp->head;
> if (cmpxchg(&lock->head_tail,
> asold.head_tail,
> asnew.head_tail) == asold.head_tail) {
> @@ -128,12 +121,9 @@ void tkt_q_do_wake(arch_spinlock_t *lock)
> struct tkt_q_head *tqhp;
> struct tkt_q *tqp;
>
> - /*
> - * If the queue is still being set up, wait for it. Note that
> - * the caller's xadd() provides the needed memory ordering.
> - */
> - while ((tqhp = tkt_q_find_head(lock)) == NULL)
> - cpu_relax();
> + tqhp = tkt_q_find_head(lock);
> + ACCESS_ONCE(lock->tickets.head) -= __TKT_SPIN_INC;
> + ACCESS_ONCE(tqhp->head) = (tqhp->head & ~0x1) + __TKT_SPIN_INC;
>
> for (;;) {
>
> @@ -145,9 +135,7 @@ void tkt_q_do_wake(arch_spinlock_t *lock)
> return; /* No element, successfully removed queue. */
> cpu_relax();
> }
> - if (ACCESS_ONCE(tqhp->head_tkt) != -1)
> - ACCESS_ONCE(tqhp->head_tkt) = -1;
> - smp_mb(); /* Order pointer fetch and assignment against handoff. */
> + smp_mb(); /* Order modification, pointer fetch and assignment against handoff. */
> ACCESS_ONCE(tqp->cpu) = -1;
> }
> EXPORT_SYMBOL(tkt_q_do_wake);
> @@ -169,10 +157,7 @@ bool tkt_q_do_spin(arch_spinlock_t *lock, struct __raw_tickets inc)
> */
> smp_mb(); /* See above block comment. */
>
> - /* If there no longer is a queue, leave. */
> tqhp = tkt_q_find_head(lock);
> - if (tqhp == NULL)
> - return false;
>
> /* Initialize our queue element. */
> tq.cpu = raw_smp_processor_id();
> @@ -180,9 +165,8 @@ bool tkt_q_do_spin(arch_spinlock_t *lock, struct __raw_tickets inc)
> tq.next = NULL;
>
> /* Check to see if we already hold the lock. */
> - if (ACCESS_ONCE(tqhp->head_tkt) == inc.tail) {
> + if (ACCESS_ONCE(tqhp->head) == (inc.tail | 0x1)) {
> /* The last holder left before queue formed, we hold lock. */
> - tqhp->head_tkt = -1;
> return true;
> }
>
> @@ -290,16 +274,14 @@ tkt_q_init_contend(int i, arch_spinlock_t *lock, struct __raw_tickets inc)
> * Record the head counter in case one of the spinning
> * CPUs already holds the lock but doesn't realize it yet.
> */
> - tqhp->head_tkt = asold.tickets.head;
> + tqhp->head = asold.tickets.head | 0x1;
>
> /* The low-order bit in the head counter says "queued". */
> - asnew.tickets.head |= 0x1;
> + asnew.tickets.head = (i << 2) + 0x1;
> } while (cmpxchg(&lock->head_tail,
> asold.head_tail,
> asnew.head_tail) != asold.head_tail);
>
> - /* Point the queue at the lock and go spin on it. */
> - ACCESS_ONCE(tqhp->ref) = lock;
> return tkt_q_do_spin(lock, inc);
> }
>
> @@ -321,15 +303,8 @@ bool tkt_q_start_contend(arch_spinlock_t *lock, struct __raw_tickets inc)
> * the lock with the corresponding queue.
> */
> do {
> - /*
> - * Use 0x1 to mark the queue in use, but also avoiding
> - * any spinners trying to use it before we get it all
> - * initialized.
> - */
> if (!tkt_q_heads[i].ref &&
> - cmpxchg(&tkt_q_heads[i].ref,
> - NULL,
> - (arch_spinlock_t *)0x1) == NULL) {
> + cmpxchg(&tkt_q_heads[i].ref, NULL, lock) == NULL) {
>
> /* Succeeded, now go initialize it. */
> return tkt_q_init_contend(i, lock, inc);
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Heiko Stübner: "Re: [PATCH 1/2] pinctrl: add function to parse generic pinconfig properties from a dt node"
Previous message: J. Bruce Fields: "Re: [PATCH v2 12/14] locks: give the blocked_hash its own spinlock"
In reply to: Lai Jiangshan: "Re: [PATCH RFC ticketlock] v3 Auto-queued ticketlock"
Next in thread: Lai Jiangshan: "Re: [PATCH RFC ticketlock] v3 Auto-queued ticketlock"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]