Re: [PATCH v3 2/7] locking/pvqspinlock: Add pending bit support

From: Davidlohr Bueso
Date: Sun Jul 26 2015 - 20:56:54 EST


On Wed, 2015-07-22 at 16:12 -0400, Waiman Long wrote:
> Like the native qspinlock, using the pending bit when it is lightly
> loaded to acquire the lock is faster than going through the PV queuing
> process which is even slower than the native queuing process. It also
> avoids loading two additional cachelines (the MCS and PV nodes).
>
> This patch adds the pending bit support for PV qspinlock. The pending
> bit code has a smaller spin threshold (1<<10). It will default back
> to the queuing method if it cannot acquired the lock within a certain
> time limit.

Can we infer that this new spin threshold is the metric to detect these
"light loads"? If so, I cannot help but wonder if there is some more
straightforward/ad-hoc way of detecting this, ie some pv_<> function.
That would also save a lot of time as it would not be time based.
Although it might be a more costly call altogether, I dunno.

Some comments about this 'loop' threshold.

> +static int pv_pending_lock(struct qspinlock *lock, u32 val)
> +{
> + int loop = PENDING_SPIN_THRESHOLD;
> + u32 new, old;
> +
> + /*
> + * wait for in-progress pending->locked hand-overs
> + */
> + if (val == _Q_PENDING_VAL) {
> + while (((val = atomic_read(&lock->val)) == _Q_PENDING_VAL) &&
> + loop--)
> + cpu_relax();
> + }
> +
> + /*
> + * trylock || pending
> + */
> + for (;;) {
> + if (val & ~_Q_LOCKED_MASK)
> + goto queue;
> + new = _Q_LOCKED_VAL;
> + if (val == new)
> + new |= _Q_PENDING_VAL;
> + old = atomic_cmpxchg(&lock->val, val, new);
> + if (old == val)
> + break;
> + if (loop-- <= 0)
> + goto queue;
> + }

So I'm not clear about the semantics of what (should) occurs when the
threshold is exhausted. In the trylock/pending loop above, you
immediately return 0, indicating we want to queue. Ok, but below:

> +
> + if (new == _Q_LOCKED_VAL)
> + goto gotlock;
> + /*
> + * We are pending, wait for the owner to go away.
> + */
> + while (((val = smp_load_acquire(&lock->val.counter)) & _Q_LOCKED_MASK)
> + && (loop-- > 0))
> + cpu_relax();
> +
> + if (!(val & _Q_LOCKED_MASK)) {
> + clear_pending_set_locked(lock);
> + goto gotlock;
> + }
> + /*
> + * Clear the pending bit and fall back to queuing
> + */
> + clear_pending(lock);

... you call clear_pending before returning. Is this intentional? Smells
fishy.

And basically afaict all this chunk of code does is spin until loop is
exhausted, and breakout when we got the lock. Ie, something like this is
a lot cleaner:

while (loop--) {
/*
* We are pending, wait for the owner to go away.
*/
val = smp_load_acquire(&lock->val.counter);
if (!(val & _Q_LOCKED_MASK)) {
clear_pending_set_locked(lock);
goto gotlock;
}

cpu_relax();
}

/*
* Clear the pending bit and fall back to queuing
*/
clear_pending(lock);


> +queue:
> + return 0;
> +
> +gotlock:
> + return 1;
> +}
> +

Thanks,
Davidlohr

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/