Re: [PATCH v2] locking/qspinlock: Save previous node & owner CPU into mcs_spinlock

From: Waiman Long
Date: Mon May 06 2024 - 11:05:45 EST


On 5/6/24 06:36, Peter Zijlstra wrote:
On Fri, May 03, 2024 at 10:41:06PM -0400, Waiman Long wrote:

Not much of a fan of this thing, but I suppose it won't hurt too much ...
I purposely doesn't touch the fast path and add code only to slow path to make sure there shouldn't be any noticeable performance impact.
diff --git a/kernel/locking/mcs_spinlock.h b/kernel/locking/mcs_spinlock.h
index 85251d8771d9..cbe6f07dff2e 100644
--- a/kernel/locking/mcs_spinlock.h
+++ b/kernel/locking/mcs_spinlock.h
@@ -13,12 +13,19 @@
#ifndef __LINUX_MCS_SPINLOCK_H
#define __LINUX_MCS_SPINLOCK_H
+/*
+ * Save an encoded version of the current MCS lock owner CPU to the
+ * mcs_spinlock structure of the next lock owner.
+ */
+#define MCS_LOCKED (smp_processor_id() + 1)
+
#include <asm/mcs_spinlock.h>
struct mcs_spinlock {
struct mcs_spinlock *next;
- int locked; /* 1 if lock acquired */
- int count; /* nesting count, see qspinlock.c */
+ int locked; /* non-zero if lock acquired */
+ short count; /* nesting count, see qspinlock.c */
+ short prev_node; /* encoded previous node value */
Strictly speaking, count shouldn't ever go much higher than 4, so I
suppose a byte should be sufficient. That would then leave 24bit for the
prev thing, but you'll get to use bitfields or some other horrible
thing.
Using bit field will be more expensive. So I am deferring that until the time it becomes necessary.
};
#ifndef arch_mcs_spin_lock_contended
@@ -42,7 +49,7 @@ do { \
* unlocking.
*/
#define arch_mcs_spin_unlock_contended(l) \
- smp_store_release((l), 1)
+ smp_store_release((l), MCS_LOCKED)
This leaves ARM up a creek I suppose... Also, there's but a single
MCS_LOCKED user.

I am aware that ARM has its ownarch_mcs_spin_unlock_contended() macro defined. That is why I define the MCS_LOCKED macro before including asm/mcs_spinlock.h. I am planning to send an arm patch to change 1 to MCS_LOCKED if this patch is merged to close the gap. None other arches have arch_mcs_spin_unlock_contended() macro defined.

Cheers, Longman