Re: [RFC PATCH] introduce sys_membarrier(): process-wide memorybarrier

From: Mathieu Desnoyers
Date: Fri Jan 08 2010 - 20:02:40 EST

Next message: Trond Myklebust: "[RFC PATCH 0/2] Fix up the NFS mmap code"
Previous message: Mike Frysinger: "Re: [PATCH 5/6] NOMMU: Fix race between ramfs truncation and shared mmap"
In reply to: Paul E. McKenney: "Re: [RFC PATCH] introduce sys_membarrier(): process-wide memorybarrier"
Next in thread: Paul E. McKenney: "Re: [RFC PATCH] introduce sys_membarrier(): process-wide memorybarrier"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

* Paul E. McKenney (paulmck@xxxxxxxxxxxxxxxxxx) wrote:
> On Fri, Jan 08, 2010 at 06:53:38PM -0500, Mathieu Desnoyers wrote:
> > * Steven Rostedt (rostedt@xxxxxxxxxxx) wrote:
> > > Well, if we just grab the task_rq(task)->lock here, then we should be
> > > OK? We would guarantee that curr is either the task we want or not.
> >
> > Hrm, I just tested it, and there seems to be a significant performance
> > penality involved with taking these locks for each CPU, even with just 8
> > cores. So if we can do without the locks, that would be preferred.
>
> How significant? Factor of two? Two orders of magnitude?
>

On a 8-core Intel Xeon (T is the number of threads receiving the IPIs):

Without runqueue locks:

T=1: 0m13.911s
T=2: 0m20.730s
T=3: 0m21.474s
T=4: 0m27.952s
T=5: 0m26.286s
T=6: 0m27.855s
T=7: 0m29.695s

With runqueue locks:

T=1: 0m15.802s
T=2: 0m22.484s
T=3: 0m24.751s
T=4: 0m29.134s
T=5: 0m30.094s
T=6: 0m33.090s
T=7: 0m33.897s

So on 8 cores, taking spinlocks for each of the 8 runqueues adds about
15% overhead when doing an IPI to 1 thread. Therefore, that won't be
pretty on 128+-core machines.

Thanks,

Mathieu

--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Trond Myklebust: "[RFC PATCH 0/2] Fix up the NFS mmap code"
Previous message: Mike Frysinger: "Re: [PATCH 5/6] NOMMU: Fix race between ramfs truncation and shared mmap"
In reply to: Paul E. McKenney: "Re: [RFC PATCH] introduce sys_membarrier(): process-wide memorybarrier"
Next in thread: Paul E. McKenney: "Re: [RFC PATCH] introduce sys_membarrier(): process-wide memorybarrier"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]