Re: [PATCH v2] locking/qspinlock: Save previous node & owner CPU into mcs_spinlock

From: Peter Zijlstra
Date: Mon May 06 2024 - 06:36:49 EST


On Fri, May 03, 2024 at 10:41:06PM -0400, Waiman Long wrote:

Not much of a fan of this thing, but I suppose it won't hurt too much ...

> diff --git a/kernel/locking/mcs_spinlock.h b/kernel/locking/mcs_spinlock.h
> index 85251d8771d9..cbe6f07dff2e 100644
> --- a/kernel/locking/mcs_spinlock.h
> +++ b/kernel/locking/mcs_spinlock.h
> @@ -13,12 +13,19 @@
> #ifndef __LINUX_MCS_SPINLOCK_H
> #define __LINUX_MCS_SPINLOCK_H
>
> +/*
> + * Save an encoded version of the current MCS lock owner CPU to the
> + * mcs_spinlock structure of the next lock owner.
> + */
> +#define MCS_LOCKED (smp_processor_id() + 1)
> +
> #include <asm/mcs_spinlock.h>
>
> struct mcs_spinlock {
> struct mcs_spinlock *next;
> - int locked; /* 1 if lock acquired */
> - int count; /* nesting count, see qspinlock.c */
> + int locked; /* non-zero if lock acquired */
> + short count; /* nesting count, see qspinlock.c */
> + short prev_node; /* encoded previous node value */

Strictly speaking, count shouldn't ever go much higher than 4, so I
suppose a byte should be sufficient. That would then leave 24bit for the
prev thing, but you'll get to use bitfields or some other horrible
thing.

> };
>
> #ifndef arch_mcs_spin_lock_contended
> @@ -42,7 +49,7 @@ do { \
> * unlocking.
> */
> #define arch_mcs_spin_unlock_contended(l) \
> - smp_store_release((l), 1)
> + smp_store_release((l), MCS_LOCKED)

This leaves ARM up a creek I suppose... Also, there's but a single
MCS_LOCKED user.

> #endif
>
> /*
> diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
> index ebe6b8ec7cb3..df78d13efa3d 100644
> --- a/kernel/locking/qspinlock.c
> +++ b/kernel/locking/qspinlock.c
> @@ -436,6 +436,7 @@ void __lockfunc queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
>
> node->locked = 0;
> node->next = NULL;
> + node->prev_node = 0;
> pv_init_node(node);
>
> /*
> @@ -463,6 +464,13 @@ void __lockfunc queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
> old = xchg_tail(lock, tail);
> next = NULL;
>
> + /*
> + * The prev_node value is saved for crash dump analysis purpose only,
> + * it is not used within the qspinlock code. The encoded node value
> + * may be truncated if there are 16k or more CPUs in the system.
> + */
> + node->prev_node = old >> _Q_TAIL_IDX_OFFSET;
> +
> /*
> * if there was a previous node; link it and wait until reaching the
> * head of the waitqueue.
> --
> 2.39.3
>