Re: [PATCH v3 3/5] locking/qspinlock: Introduce CNA into the slow path of qspinlock

From: Waiman Long
Date: Tue Jul 16 2019 - 10:50:15 EST


On 7/16/19 10:29 AM, Alex Kogan wrote:
>
>> On Jul 15, 2019, at 7:22 PM, Waiman Long <longman@xxxxxxxxxx
>> <mailto:longman@xxxxxxxxxx>> wrote:
>>
>> On 7/15/19 5:30 PM, Waiman Long wrote:
>>>> -#ifndef _GEN_PV_LOCK_SLOWPATH
>>>> +#if !defined(_GEN_PV_LOCK_SLOWPATH) && !defined(_GEN_CNA_LOCK_SLOWPATH)
>>>>
>>>> #include <linux/smp.h>
>>>> #include <linux/bug.h>
>>>> @@ -77,18 +77,14 @@
>>>> #define MAX_NODES 4
>>>>
>>>> /*
>>>> - * On 64-bit architectures, the mcs_spinlock structure will be 16 bytes in
>>>> - * size and four of them will fit nicely in one 64-byte cacheline. For
>>>> - * pvqspinlock, however, we need more space for extra data. To accommodate
>>>> - * that, we insert two more long words to pad it up to 32 bytes. IOW, only
>>>> - * two of them can fit in a cacheline in this case. That is OK as it is rare
>>>> - * to have more than 2 levels of slowpath nesting in actual use. We don't
>>>> - * want to penalize pvqspinlocks to optimize for a rare case in native
>>>> - * qspinlocks.
>>>> + * On 64-bit architectures, the mcs_spinlock structure will be 20 bytes in
>>>> + * size. For pvqspinlock or the NUMA-aware variant, however, we need more
>>>> + * space for extra data. To accommodate that, we insert two more long words
>>>> + * to pad it up to 36 bytes.
>>>> */
>>> The 20 bytes figure is wrong. It is actually 24 bytes for 64-bit as the
>>> mcs_spinlock structure is 8-byte aligned. For better cacheline
>>> alignment, I will like to keep mcs_spinlock to 16 bytes as before.
>>> Instead, you can use encode_tail() to store the CNA node pointer in
>>> "locked". For instance, use (encode_tail() << 1) in locked to
>>> distinguish it from the regular locked=1 value.
>>
>> Actually, the encoded tail value is already shift left either 16 bits
>> or 9 bits. So there is no need to shift it. You can assigned it directly:
>>
>> mcs->locked = cna->encoded_tail;
>>
>> You do need to change the type of locked to "unsigned int", though,
>> for proper comparison with "1".
>>
> Got it, thanks.
>
I forgot to mention that I would like to see a boot command line option
to force off and maybe on as well the numa qspinlock code. This can help
in testing as you don't need to build 2 separate kernels, one with
NUMA_AWARE_SPINLOCKS on and one with it off. For small 2-socket systems,
numa qspinlock may not help much. So an option to turn it off can be
useful. Xen also have an option to turns off PV qspinlock.

-Longman