Re: [RFC PATCH] introduce sys_membarrier(): process-wide memorybarrier

From: Mathieu Desnoyers
Date: Fri Jan 08 2010 - 20:02:40 EST

* Paul E. McKenney (paulmck@xxxxxxxxxxxxxxxxxx) wrote:
> On Fri, Jan 08, 2010 at 06:53:38PM -0500, Mathieu Desnoyers wrote:
> > * Steven Rostedt (rostedt@xxxxxxxxxxx) wrote:
> > > Well, if we just grab the task_rq(task)->lock here, then we should be
> > > OK? We would guarantee that curr is either the task we want or not.
> >
> > Hrm, I just tested it, and there seems to be a significant performance
> > penality involved with taking these locks for each CPU, even with just 8
> > cores. So if we can do without the locks, that would be preferred.
> How significant? Factor of two? Two orders of magnitude?

On a 8-core Intel Xeon (T is the number of threads receiving the IPIs):

Without runqueue locks:

T=1: 0m13.911s
T=2: 0m20.730s
T=3: 0m21.474s
T=4: 0m27.952s
T=5: 0m26.286s
T=6: 0m27.855s
T=7: 0m29.695s

With runqueue locks:

T=1: 0m15.802s
T=2: 0m22.484s
T=3: 0m24.751s
T=4: 0m29.134s
T=5: 0m30.094s
T=6: 0m33.090s
T=7: 0m33.897s

So on 8 cores, taking spinlocks for each of the 8 runqueues adds about
15% overhead when doing an IPI to 1 thread. Therefore, that won't be
pretty on 128+-core machines.



Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at