Re: [PATCH 2/5] riscv: Add QUEUED_SPINLOCKS & QUEUED_RWLOCKS supported

From: Guo Ren
Date: Wed Nov 25 2020 - 20:37:05 EST


Hi Will,

On Wed, Nov 25, 2020 at 10:31 PM Will Deacon <will@xxxxxxxxxx> wrote:
>
> On Wed, Nov 25, 2020 at 03:16:45PM +0100, Peter Zijlstra wrote:
> > @@ -207,6 +187,32 @@ static __always_inline void clear_pending_set_locked(struct qspinlock *lock)
> > atomic_add(-_Q_PENDING_VAL + _Q_LOCKED_VAL, &lock->val);
> > }
> >
> > +#endif /* _Q_PENDING_BITS == 8 */
> > +
> > +#if _Q_PENDING_BITS == 8 && ARCH_HAS_XCHG16
> > +
> > +/*
> > + * xchg_tail - Put in the new queue tail code word & retrieve previous one
> > + * @lock : Pointer to queued spinlock structure
> > + * @tail : The new queue tail code word
> > + * Return: The previous queue tail code word
> > + *
> > + * xchg(lock, tail), which heads an address dependency
> > + *
> > + * p,*,* -> n,*,* ; prev = xchg(lock, node)
> > + */
> > +static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail)
> > +{
> > + /*
> > + * We can use relaxed semantics since the caller ensures that the
> > + * MCS node is properly initialized before updating the tail.
> > + */
> > + return (u32)xchg_relaxed(&lock->tail,
> > + tail >> _Q_TAIL_OFFSET) << _Q_TAIL_OFFSET;
> > +}
> > +
> > +#else /* !(_Q_PENDING_BITS == 8 && ARCH_HAS_XCHG16) */
>
> Why can't architectures just implement this with a 32-bit xchg instruction
> if they don't have one that operates on 16 bits? Sure, they'll store more
> data, but it's atomic so you shouldn't be able to tell... (ignoring parisc
> crazy).
>
> Also, I'm surprised qspinlock benefits riscv. On arm64, there's nothing in
> it over tickets for <= 16 CPUs.
NUMA is on the way:
https://lore.kernel.org/linux-riscv/20201119003829.1282810-1-atish.patra@xxxxxxx/

With your advice, I think we could using tickets lock when <= 16 CPUs
and using qspinlock when > 16 CPUs.
Is that right?

The next patchset plan is:
- Using tickets & qspinlock together in riscv. Abandon 16bits
xchg/cmpxchg implementation.
- Abanden qspinlock in csky, because it only could 4 CPUs' SMP.

>
> Will



--
Best Regards
Guo Ren

ML: https://lore.kernel.org/linux-csky/