[PATCH diagnostic qspinlock] Diagnostics for excessive lock-drop wait loop time

From: Paul E. McKenney
Date: Wed Jan 11 2023 - 19:36:45 EST


We see systems stuck in the queued_spin_lock_slowpath() loop that waits
for the lock to become unlocked in the case where the current CPU has
set pending state. Therefore, this not-for-mainline commit gives a warning
that includes the lock word state if the loop has been spinning for more
than 10 seconds. It also adds a WARN_ON_ONCE() that complains if the
lock is not in pending state.

If this is to be placed in production, some reporting mechanism not
involving spinlocks is likely needed, for example, BPF, trace events,
or some combination thereof.

Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx>

diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index ac5a3e6d3b564..be1440782c4b3 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -379,8 +379,22 @@ void __lockfunc queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
* clear_pending_set_locked() implementations imply full
* barriers.
*/
- if (val & _Q_LOCKED_MASK)
- atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_MASK));
+ if (val & _Q_LOCKED_MASK) {
+ int cnt = _Q_PENDING_LOOPS;
+ unsigned long j = jiffies + 10 * HZ;
+ struct qspinlock qval;
+ int val;
+
+ for (;;) {
+ val = atomic_read_acquire(&lock->val);
+ atomic_set(&qval.val, val);
+ WARN_ON_ONCE(!(val & _Q_PENDING_VAL));
+ if (!(val & _Q_LOCKED_MASK))
+ break;
+ if (!--cnt && !WARN(time_after(jiffies, j), "%s: Still pending and locked: %#x (%c%c%#x)\n", __func__, val, ".L"[!!qval.locked], ".P"[!!qval.pending], qval.tail))
+ cnt = _Q_PENDING_LOOPS;
+ }
+ }

/*
* take ownership and clear the pending bit.