Re: Question about qspinlock

From: Yun Levi
Date: Sat Jan 16 2021 - 02:36:07 EST


Sorry.. I see spin_lock is running after preempt_disable.
Sorry to make a noise.

On Fri, Jan 15, 2021 at 11:03 AM Yun Levi <ppbuk5246@xxxxxxxxx> wrote:
>
> Hi Peter, Ingo, Will and linux-kernel.
>
> While I read the code of queued_spin_lock_slowpath function,
> I have some questions about an unrelated nesting case when qspinlock is waiting.
>
> Suppose there are CPU1 to CPU8.
> There are two locks named lock1 and lock2 which are not related to each other.
>
> At first, Thread 1 got a lock1.
> And Thread 2 and Thread 3 are contending to get lock1 and each waiting on
> CPU2 and CPU3.
>
> Next, Thread 5 got a lock2.
> And Thread 6 and Thread 7 are contending to get lock2 and each waiting on
> CPU6 and CPU7.
>
> In this situation, Thread 2 consumes all its quantum and switched new
> thread named "NTHREAD"
> But This NTHREAD tries to get lock2 and recognize someone is waiting on queue,
> and try to add itself to queue again.
>
> My questions are:
> 1. When this situation happens, the qnode idx will not be 0, but
> it seems to be nesting.
> Could this situation happen or just my leak of understanding of code?
>
> 2. If (1) situation does not happen, why does it not happen?
>
> 3. If (1) situation can happen, lock contamination could be
> happened when more than
> three threads waiting, unlocked one of them and another
> waiting again on same CPU.
> So, how about making them wait as idx >= MAX_NODE when someone
> try to queuing,
> But it try to lock different from one who queue in waiting other lock?
>
> My idea is:
> diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
> index b9515fcc9b29..fbb6e2effb59 100644
> --- a/kernel/locking/qspinlock.c
> +++ b/kernel/locking/qspinlock.c
> @@ -106,6 +106,7 @@ struct qnode {
> * PV doubles the storage and uses the second cacheline for PV state.
> */
> static DEFINE_PER_CPU_ALIGNED(struct qnode, qnodes[MAX_NODES]);
> +static DEFINE_PER_CPU(struct qspinlock *, curr_lock);
>
> /*
> * We must be able to distinguish between no-tail and the tail at 0:0,
> @@ -315,6 +316,7 @@ static __always_inline u32
> __pv_wait_head_or_lock(struct qspinlock *lock,
> void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
> {
> struct mcs_spinlock *prev, *next, *node;
> + struct qspinlock *saved_lock = NULL;
> u32 old, tail;
> int idx;
>
> @@ -401,6 +403,13 @@ void queued_spin_lock_slowpath(struct qspinlock
> *lock, u32 val)
> idx = node->count++;
> tail = encode_tail(smp_processor_id(), idx);
>
> + if (likely(!idx)) {
> + *this_cpu_ptr(&curr_lock) = lock;
> + saved_lock = lock;
> + } else {
> + saved_lock = *this_cpu_ptr(&curr_lock);
> + }
> +
> /*
> * 4 nodes are allocated based on the assumption that there will
> * not be nested NMIs taking spinlocks. That may not be true in
> @@ -410,7 +419,7 @@ void queued_spin_lock_slowpath(struct qspinlock
> *lock, u32 val)
> * any MCS node. This is not the most elegant solution, but is
> * simple enough.
> */
> - if (unlikely(idx >= MAX_NODES)) {
> + if (unlikely((idx >= MAX_NODES) || lock != saved_lock)) {
> lockevent_inc(lock_no_node);
> while (!queued_spin_trylock(lock))
> cpu_relax();
> @@ -557,7 +566,8 @@ void queued_spin_lock_slowpath(struct qspinlock
> *lock, u32 val)
> /*
> * release the node
> */
> - __this_cpu_dec(qnodes[0].mcs.count);
> + if (unlikely(saved_lock == lock))
> + __this_cpu_dec(qnodes[0].mcs.count);
> }
> EXPORT_SYMBOL(queued_spin_lock_slowpath);
>
> Thank you.