Re: [PATCH RFC tip/core/rcu] rcu: direct algorithmic SRCUimplementation

From: Paul E. McKenney
Date: Mon Feb 20 2012 - 20:50:48 EST


On Tue, Feb 21, 2012 at 09:11:47AM +0800, Lai Jiangshan wrote:
> On 02/21/2012 01:44 AM, Paul E. McKenney wrote:
>
> >
> >> My conclusion, we can just remove the check-and-return path to reduce
> >> the complexity since we will introduce call_srcu().
> >
> > If I actually submit the above upstream, that would be quite reasonable.
> > My thought is that patch remains RFC and the upstream version has
> > call_srcu().
>
> Does the work of call_srcu() is started or drafted?

I do have a draft design, and am currently beating it into shape.
No actual code yet, though. The general idea at the moment is as follows:

o The state machine must be preemptible. I recently received
a bug report about 200-microsecond latency spikes on a system
with more than a thousand CPUs, so the summation of the per-CPU
counters and subsequent recheck cannot be in a preempt-disable
region. I am therefore currently thinking in terms of a kthread.

o At the moment, having a per-srcu_struct kthread seems excessive.
I am planning on a single kthread to do the counter summation
and checking. Further parallelism might be useful in the future,
but I would want to see someone run into problems before adding
more complexity.

o There needs to be a linked list of srcu_struct structures so
that they can be traversed by the state-machine kthread.

o If there are expedited SRCU callbacks anywhere, the kthread
would scan through the list of srcu_struct structures quickly
(perhaps pausing a few microseconds between). If there are no
expedited SRCU callbacks, the kthread would wait a jiffy or so
between scans.

o If a given srcu_struct structure has been scanned too many times
(say, more than ten times) while waiting for the counters to go
to zero, it loses expeditedness. It makes no sense for the kthread
to go CPU-bound just because some SRCU reader somewhere is blocked
in its SRCU read-side critical section.

o Expedited SRCU callbacks cannot be delayed by normal SRCU
callbacks, but neither can expedited callbacks be allowed to
starve normal callbacks. I am thinking in terms of invoking these
from softirq context, with a pair of multi-tailed callback queues
per CPU, stored in the same structure as the per-CPU counters.

o There are enough srcu_struct structures in the Linux that
it does not make sense to force softirq to dig through them all
any time any one of them has callbacks ready to invoke. One way
to deal with this is to have a per-CPU set of linked lists of
of srcu_struct_array structures, so that the kthread enqueues
a given structure when it transitions to having callbacks ready
to invoke, and softirq dequeues it. This can be done locklessly
given that there is only one producer and one consumer.

o We can no longer use the trick of pushing callbacks to another
CPU from the CPU_DYING notifier because it is likely that CPU
hotplug will stop using stop_cpus(). I am therefore thinking
in terms of a set of orphanages (two for normal, two more for
expedited -- one set of each for callbacks ready to invoke,
the other for still-waiting callbacks).

o There will need to be an srcu_barrier() that can be called
before cleanup_srcu_struct(). Otherwise, someone will end up
freeing up an srcu_struct that still has callbacks outstanding.

But what did you have in mind?

> >> This new srcu is very great, especially the SRCU_USAGE_COUNT for every
> >> lock/unlock witch forces any increment/decrement pair changes the counter
> >> for me.
> >
> > Glad you like it! ;-)
> >
> > And thank you for your review and feedback!
>
> Could you add my Reviewed-by when this patch is last submitted?
>
>
> Reviewed-by: Lai Jiangshan <laijs@xxxxxxxxxxxxxx>

Will do, thank you!

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/