Re: [PATCH V2 1/7] rcu: use preempt_count to test whether scheduler locks is held

From: Paul E. McKenney
Date: Fri Nov 15 2019 - 11:53:53 EST


On Sat, Nov 02, 2019 at 12:45:53PM +0000, Lai Jiangshan wrote:
> Ever since preemption was introduced to linux kernel,
> irq disabled spinlocks are always held with preemption
> disabled. One of the reason is that sometimes we need
> to use spin_unlock() which will do preempt_enable()
> to unlock the irq disabled spinlock with keeping irq
> disabled. So preempt_count can be used to test whether
> scheduler locks is possible held.
>
> CC: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Signed-off-by: Lai Jiangshan <laijs@xxxxxxxxxxxxxxxxx>

Again, your point that RCU flavor consolidation allows some
simplifications is an excellent one, so thank you again.

And sorry to be slow, but the interaction with the rest of RCU must
be taken into account. Therefore, doing this patch series justice
does require some time.

> ---
> kernel/rcu/tree_plugin.h | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 0982e9886103..aba5896d67e3 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -603,10 +603,14 @@ static void rcu_read_unlock_special(struct task_struct *t)
> tick_nohz_full_cpu(rdp->cpu);
> // Need to defer quiescent state until everything is enabled.
> if (irqs_were_disabled && use_softirq &&
> - (in_interrupt() ||
> - (exp && !t->rcu_read_unlock_special.b.deferred_qs))) {
> + (in_interrupt() || (exp && !preempt_bh_were_disabled))) {

My concern here is that this relies on a side-effect of the _irq locking
primitives. What if someone similar to you comes along and is able to
show significant performance benefits from making raw_spin_lock_irqsave()
and friends leave preempt_count alone? After all, these primitives
disable interrupts, so the bits in preempt_count can be argued to have
no effect.

But this patch is not central to simplifying __rcu_read_unlock().
Plus RCU now re-enables the scheduler clock tick on nohz_full CPUs that
are blocking normal grace periods, which gives additional flexibility
on this code path -- one of the big concerns when this was written was
that in a PREEMPT=y kernel, a nohz_full CPU spinning in kernel code might
never pass through a quiescent state. And expedited grace periods need
to be fast on average, not worst case.

So another approach might be to:

1. Simplfy the above expression to only do raise_softirq_irqoff()
if we are actually in an interrupt handler.

2. Make expedited grace periods re-enable the scheduler-clock
interrupt on CPUs that are slow to pass through quiescent states.
(Taking care to disable them again, which might require
coordination with the similar logic in normal grace periods.)

As a second step, it might still be possible to continue using
raise_softirq_irqoff() in some of the non-interrupt-handler cases
involving __rcu_read_unlock() with interrupts disabled.

Thoughts?

Thanx, Paul

> // Using softirq, safe to awaken, and we get
> // no help from enabling irqs, unlike bh/preempt.
> + // in_interrupt(): raise_softirq_irqoff() is
> + // guaranteed not to not do wakeup
> + // !preempt_bh_were_disabled: scheduler locks cannot
> + // be held, since spinlocks are always held with
> + // preempt_disable(), so the wakeup will be safe.
> raise_softirq_irqoff(RCU_SOFTIRQ);
> } else {
> // Enabling BH or preempt does reschedule, so...
> --
> 2.20.1
>