Re: RCU scaling on large systems

From: Paul E. McKenney
Date: Mon May 03 2004 - 15:06:45 EST


On Mon, May 03, 2004 at 09:39:11AM -0700, Jesse Barnes wrote:
> On Sunday, May 2, 2004 11:28 am, Paul E. McKenney wrote:
> > From your numbers below, I would guess that if you have at least
> > 8 CPUs per NUMA node, a two-level tree would suffice. If you have
> > only 4 CPUs per NUMA node, you might well need a per-node level,
> > a per-4-nodes level, and a global level to get the global lock
> > contention reduced sufficiently.
>
> Actually, only 2, but it sounds like your approach would work.

OK, make that a per-node level, a per-8-nodes level, and a global
level. ;-) The per-node level might or might not be helpful,
depending on your memory latencies.

> > Cute! However, it is not clear to me that this approach is
> > compatible with real-time use of RCU, since it results in CPUs
> > processing their callbacks less frequently, and thus getting
> > more of them to process at a time.
>
> I think it was just a proof-of-concept--the current RCU design obviously
> wasn't designed with this machine in mind :).

Agreed! ;-)

> > But it is not clear to me that anyone is looking for realtime
> > response from a 512-CPU machine (yow!!!), so perhaps this
> > is not a problem...
>
> There are folks that would like realtime (or close to realtime) response on
> such systems, so it would be best not to do anything that would explicitly
> prevent it.

The potential problem with less-frequent processing of callbacks is
that it would result in larger "bursts" of callbacks to be processed,
degrading realtime scheduling latency. There are some patches that
help avoid this problem, but they probably need more testing and
tuning.

> > This patch certainly seems simple enough, and I would guess that
> > "jiffies" is referenced often enough that it is warm in the cache
> > despite being frequently updated.
> >
> > Other thoughts?
>
> On a big system like this though, won't reading jiffies frequently be another
> source of contention?

My thought was that each CPU was already reading jiffies several times
each tick anyway, so that it would already be cached when RCU wanted
to look at it. But I must defer to your experience with this particular
machine.

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/