Re: [PATCH] a local-timer-free version of RCU

From: Lai Jiangshan
Date: Tue Nov 09 2010 - 04:18:01 EST




On Sat, Nov 6, 2010 at 5:00 AM, Joe Korty <joe.korty@xxxxxxxx> wrote:
> +}
> +
> +/**
> + * rcu_read_lock - mark the beginning of an RCU read-side critical section.
> + *
> + * When synchronize_rcu() is invoked on one CPU while other CPUs
> + * are within RCU read-side critical sections, then the
> + * synchronize_rcu() is guaranteed to block until after all the other
> + * CPUs exit their critical sections. Similarly, if call_rcu() is invoked
> + * on one CPU while other CPUs are within RCU read-side critical
> + * sections, invocation of the corresponding RCU callback is deferred
> + * until after the all the other CPUs exit their critical sections.
> + *
> + * Note, however, that RCU callbacks are permitted to run concurrently
> + * with RCU read-side critical sections. One way that this can happen
> + * is via the following sequence of events: (1) CPU 0 enters an RCU
> + * read-side critical section, (2) CPU 1 invokes call_rcu() to register
> + * an RCU callback, (3) CPU 0 exits the RCU read-side critical section,
> + * (4) CPU 2 enters a RCU read-side critical section, (5) the RCU
> + * callback is invoked. This is legal, because the RCU read-side critical
> + * section that was running concurrently with the call_rcu() (and which
> + * therefore might be referencing something that the corresponding RCU
> + * callback would free up) has completed before the corresponding
> + * RCU callback is invoked.
> + *
> + * RCU read-side critical sections may be nested. Any deferred actions
> + * will be deferred until the outermost RCU read-side critical section
> + * completes.
> + *
> + * It is illegal to block while in an RCU read-side critical section.
> + */
> +void __rcu_read_lock(void)
> +{
> + struct rcu_data *r;
> +
> + r = &per_cpu(rcu_data, smp_processor_id());
> + if (r->nest_count++ == 0)
> + /*
> + * Set the flags value to show that we are in
> + * a read side critical section. The code starting
> + * a batch uses this to determine if a processor
> + * needs to participate in the batch. Including
> + * a sequence allows the remote processor to tell
> + * that a critical section has completed and another
> + * has begun.
> + */

memory barrier is needed as Paul noted.

> + r->flags = IN_RCU_READ_LOCK | (r->sequence++ << 2);
> +}
> +EXPORT_SYMBOL(__rcu_read_lock);
> +
> +/**
> + * rcu_read_unlock - marks the end of an RCU read-side critical section.
> + * Check if a RCU batch was started while we were in the critical
> + * section. If so, call rcu_quiescent() join the rendezvous.
> + *
> + * See rcu_read_lock() for more information.
> + */
> +void __rcu_read_unlock(void)
> +{
> + struct rcu_data *r;
> + int cpu, flags;
> +
> + cpu = smp_processor_id();
> + r = &per_cpu(rcu_data, cpu);
> + if (--r->nest_count == 0) {
> + flags = xchg(&r->flags, 0);
> + if (flags & DO_RCU_COMPLETION)
> + rcu_quiescent(cpu);
> + }
> +}
> +EXPORT_SYMBOL(__rcu_read_unlock);

It is hardly acceptable when there are memory barriers or
atomic operations in the fast paths of rcu_read_lock(),
rcu_read_unlock().

We need some thing to drive the completion of GP
(and the callbacks process). There is no free lunch,
if the completion of GP is driven by rcu_read_unlock(),
we very probably need memory barriers or atomic operations
in the fast paths of rcu_read_lock(), rcu_read_unlock().

We need look for some periodic/continuous events of
the kernel for GP-driven, schedule-tick and schedule() are
most ideal events sources in the kernel I think.

schedule-tick and schedule() are not happened when NO_HZ
and dyntick-hpc, so we need some approaches to fix it. I vote up
for the #5 approach of Paul's.

Also, I propose an unmature idea here.

Don't tell RCU about dyntick-hpc mode, but instead
stop the RCU function of a CPU when the first time RCU disturbs
dyntick-hpc mode or NO_HZ mode.

rcu_read_lock()
if the RCU function of this CPU is not started, start it and
start a RCU timer.
handle rcu_read_lock()

enter NO_HZ
if interrupts are just happened very frequently, do nothing, else:
stop the rcu function and the rcu timer of the current CPU.

exit interrupt:
if this interrupt is just caused by RCU timer && it just disrurbs
dyntick-hpc mode or NO_HZ mode(and will reenter these modes),
stop the rcu function and stop the rcu timer of the current CPU.

schedule-tick:
requeue the rcu timer before it causes an unneeded interrupt.
handle rcu things.

+ Not big changes to RCU, use the same code to handle
dyntick-hpc mode or NO_HZ mode, reuse some code of rcu_offline_cpu()

+ No need to inform RCU of user/kernel transitions.

+ No need to turn scheduling-clock interrupts on
at each user/kernel transition.

- carefully handle some critical region which also implies
rcu critical region.

- Introduce some unneeded interrupt, but it is very infrequency.

Thanks,
Lai

> +
> +/**
> + * call_rcu - Queue an RCU callback for invocation after a grace period.
> + * @head: structure to be used for queueing the RCU updates.
> + * @func: actual update function to be invoked after the grace period
> + *
> + * The update function will be invoked some time after a full grace
> + * period elapses, in other words after all currently executing RCU
> + * read-side critical sections have completed. RCU read-side critical
> + * sections are delimited by rcu_read_lock() and rcu_read_unlock(),
> + * and may be nested.
> + */
> +void call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu))
> +{
> + struct rcu_data *r;
> + unsigned long flags;
> + int cpu;
> +
> + head->func = func;
> + head->next = NULL;
> + local_irq_save(flags);
> + cpu = smp_processor_id();
> + r = &per_cpu(rcu_data, cpu);
> + /*
> + * Avoid mixing new entries with batches which have already
> + * completed or have a grace period in progress.
> + */
> + if (r->nxt.head && rcu_move_if_done(r))
> + rcu_wake_daemon(r);
> +
> + rcu_list_add(&r->nxt, head);
> + if (r->nxtcount++ == 0) {

memory barrier is needed. (before read the rcu_batch)

> + r->nxtbatch = (rcu_batch & RCU_BATCH_MASK) + RCU_INCREMENT;
> + barrier();
> + if (!rcu_timestamp)
> + rcu_timestamp = jiffies ?: 1;
> + }
> + /* If we reach the limit start a batch. */
> + if (r->nxtcount > rcu_max_count) {
> + if (rcu_set_state(RCU_NEXT_PENDING) == RCU_COMPLETE)
> + rcu_start_batch();
> + }
> + local_irq_restore(flags);
> +}
> +EXPORT_SYMBOL_GPL(call_rcu);
> +
> +


> +/*
> + * Process the completed RCU callbacks.
> + */
> +static void rcu_process_callbacks(struct rcu_data *r)
> +{
> + struct rcu_head *list, *next;
> +
> + local_irq_disable();
> + rcu_move_if_done(r);
> + list = r->done.head;
> + rcu_list_init(&r->done);
> + local_irq_enable();
> +

memory barrier is needed. (after read the rcu_batch)

> + while (list) {
> + next = list->next;
> + list->func(list);
> + list = next;
> + }
> +}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/