Re: [PATCH RFC tip/core/rcu 6/6] rcu: Reduce cache-missinitialization latencies for large systems

From: Paul E. McKenney
Date: Fri Apr 27 2012 - 11:15:44 EST


On Fri, Apr 27, 2012 at 06:36:11AM +0200, Mike Galbraith wrote:
> On Mon, 2012-04-23 at 09:42 -0700, Paul E. McKenney wrote:
> > From: "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx>
> >
> > Commit #0209f649 (rcu: limit rcu_node leaf-level fanout) set an upper
> > limit of 16 on the leaf-level fanout for the rcu_node tree. This was
> > needed to reduce lock contention that was induced by the synchronization
> > of scheduling-clock interrupts, which was in turn needed to improve
> > energy efficiency for moderate-sized lightly loaded servers.
> >
> > However, reducing the leaf-level fanout means that there are more
> > leaf-level rcu_node structures in the tree, which in turn means that
> > RCU's grace-period initialization incurs more cache misses. This is
> > not a problem on moderate-sized servers with only a few tens of CPUs,
>
> With a distro config (4096 CPUs) interrupt latency is bad even on a
> quad. Traversing empty nodes taking locks and cache misses hurts.

Agreed -- and I will be working on an additional patch that makes RCU
avoid initializing its data structures for CPUs that don't exist.

That said, increasing the leaf-level fanout from 16 to 64 should reduce
the latency pain by a factor of four. In addition, I would expect that
real-time builds of the kernel would set NR_CPUS to some value much
smaller than 4096. ;-)

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/