Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for locklessupdate of refcount

From: Waiman Long
Date: Wed Jun 26 2013 - 19:26:38 EST


On 06/26/2013 05:22 PM, Andi Kleen wrote:
On Wed, Jun 26, 2013 at 05:07:02PM -0400, Waiman Long wrote:
On 06/26/2013 04:17 PM, Andi Kleen wrote:
+ * The combined data structure is 8-byte aligned. So proper placement of this
+ * structure in the larger embedding data structure is needed to ensure that
+ * there is no hole in it.
On i386 u64 is only 4 bytes aligned. So you need to explicitely align
it to 8 bytes. Otherwise you risk the two members crossing a cache line, which
would be really expensive with atomics.
Do you mean the original i386 or the i586 that are now used by most
distribution now? If it is the former, I recall that i386 is now no
longer supported.
I mean i386, as in the 32bit x86 architecture.

I also look around some existing codes that use cmpxchg64. It
doesn't seem like they use explicit alignment. I will need more
investigation to see if it is a real problem.
Adding the alignment is basically free. If 32bit users don't enforce
it they're likely potentially broken yes, but they may be lucky.

You are right. I will added the 8-byte alignment attribute to the union definition and document that in the code.

+ get_lock = ((threshold>= 0)&& (old.count == threshold));
+ if (likely(!get_lock&& spin_can_lock(&old.lock))) {
What is that for? Why can't you do the CMPXCHG unconditially ?
An unconditional CMPXCHG can be as bad as acquiring the spinlock. So
we need to check the conditions are ready before doing an actual
CMPXCHG.
But this isn't a cheap check. Especially spin_unlock_wait can be
very expensive.
And all these functions have weird semantics. Perhaps just a quick
spin_is_locked.

In the uncontended case, doing spin_unlock_wait will be similar to spin_can_lock. This, when combined with a cmpxchg, is still faster than doing 2 atomic operations in spin_lock/spin_unlock.

In the contended case, doing spin_unlock_wait won't be more expensive than doing spin_lock. Without doing that, most of the threads will fall back to acquiring the lock negating any performance benefit. I had tried that without spin_unlock_wait and there wasn't that much performance gain in the AIM7 short workload. BTW, spin_can_lock is just the negation of spin_is_locked.

Regards,
Longman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/