Re: [PATCH 7/8] locking: Introduce qrwlock

From: Waiman Long
Date: Fri Feb 14 2014 - 13:50:03 EST


On 02/13/2014 11:35 AM, Peter Zijlstra wrote:
On Tue, Feb 11, 2014 at 03:12:59PM -0500, Waiman Long wrote:
Using the same locktest program to repetitively take a single rwlock with
programmable number of threads and count their execution times. Each
thread takes the lock 5M times on a 4-socket 40-core Westmere-EX
system. I bound all the threads to different CPUs with the following
3 configurations:

1) Both CPUs and lock are in the same node
2) CPUs and lock are in different nodes
3) Half of the CPUs are in same node as the lock& the other half
are remote
I can't find these configurations in the below numbers; esp the first is
interesting because most computers out there have no nodes.

I have a local and remote number in the measurement data that I sent out. The local ones are when both CPUs and lock are in the same node. The remote is when they are in different nodes.

Two types of qrwlock are tested:
1) Use MCS lock
2) Use ticket lock
arch_spinlock_t; you forget that if you change that to an MCS style lock
this one goes along for free.

Yes, I am aware of that. I am not saying that it is a bad idea to use arch_spin_t. I will be happy if your version of qrwlock patch get merged. I am just saying that it maybe a better idea to use MCS lock directly especially in case that the spinlock is not converted to use a MCS-style lock. I will be more happy if that happen.



On that; I had a look at your qspinlock and got a massive head-ache so I
rewrote it. Aside from being very mess code it also suffered from a few
fairness issues in that it is possible (albeit highly unlikely) to steal
a lock instead of being properly queued; per your xchg() usage.

The below boots; but I've not done much else with it, so it will
probably explode in your face.

Thank for looking into my qspinlock patch. I will take a look at your changes and incorporate it to make it more fair. I have already rewritten it along the same line your version of the qrwlock patch. I have done some performance testing at low contention level using my microbenchmark. The qspinlock was indeed slower than ticket lock with 2-4 contending tasks. The break-even point is at 5 contending tasks. To fix this performance deficit, I added an optimized x86 specific contention path for 2 contending tasks so that it would perform better than the ticket lock. It will still be somewhat slower for 3-4 contending tasks, but the 2 contending task case is probably the most common.

With that change, I would say that my qspinlock patch should be good enough as a replacement of ticket spinlock for x86. I will send out an updated qspinlock patch in a day or two when I finish my testing.

-Longman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/