Re: [RFC 2/2] rcu: Remove ->dynticks_nmi_nesting from struct rcu_dynticks

From: Paul E. McKenney
Date: Wed Jun 20 2018 - 13:38:50 EST


On Thu, Jun 21, 2018 at 02:15:07AM +0900, Byungchul Park wrote:
> On Thu, Jun 21, 2018 at 1:49 AM, Paul E. McKenney
> <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
>
> [...]
>
> > Perhaps the fact that there are architectures that can enter interrupt
> > handlers and never leave them when the CPU is non-idle. One example of
> > this is the usermode upcalls in the comment that you removed.
> >
> > Or have all the architectures been modified so that each and every call to
> > rcu_irq_enter() and to rcu_irq_exit() are now properly paired and nested?
> >
> > Proper nesting and pairing was -not- present in the past, hence the
> > special updates (AKA "crowbar") to the counters when transitioning to
> > and from idle.
>
> Thank you very much for explaining it in detail.
>
> > If proper nesting and pairing of rcu_irq_enter() and rcu_irq_exit()
> > is now fully in force across all architectures and configurations, the
> > commit log needs to say this, preferably pointing to the corresponding
> > commits that made this change.
>
> Right.
>
> > There is a call to rcu_irq_enter() without a corresponding call to
> > rcu_irq_exit(), so that the ->dynticks_nesting counter never goes back to
> > zero so that the next time this CPU goes idle, RCU thinks that the CPU
> > is still non-idle. This can result in excessively long grace periods
> > and needless IPIs to idle CPUs.
>
> No doubt.
>
> > But only if calls to rcu_irq_enter() and rcu_irq_exit() are now always
> > properly paired and nested, which was definitely -not- the case last
> > I looked.
>
> I missed it. Right. It's worth only in the case that calls to rcu_irq_enter()
> and rcu_irq_exit() are always properly paired and nested.
>
> > OK, so I can further consider this pair of patches only if
> > all architectures now properly pair and nest rcu_irq_enter() and
> > rcu_irq_exit(). It would be very good if they did, but actually testing
> > back in the day showed that they did not. If that has changed, that
> > would be a very good thing, but if not, this patch injects bugs.
>
> Totally agree with you. Sorry bothering you.

Absolutely not a problem, absolutely no need to apologize! I am
actually very happy that you are taking RCU seriously and looking at it
in such depth.

My problem is that when I see a patch like this, something in the back of
my head screams "WRONG!!!", and I sometimes get confused about exactly
what the back of my head is screaming about, which was the case here.
Hence my misguided initial complaint about NMI nesting instead of about
the possibility of unpaired rcu_irq_enter() calls.

So apologies for that, but I unfortunately cannot promise that this
won't happen again. I have learned the hard way to trust the back of
my head. It sometimes makes mistakes, but less often than the rest of
my head does. ;-)

In the meantime, is it possible to rearrange rcu_irq_enter() and
rcu_nmi_enter() (and similarly rcu_irq_exit() and rcu_nmi_exit())
to avoid the conditionals (via compiler inlining) while still keeping
function calls ordered properly? I bet that you could do it by splitting
rcu_nmi_enter() and rcu_nmi_exit() sort of like this:

static void rcu_nmi_enter_common(bool irq)
{
/*
* You fill this in. Maybe __always_inline above. The
* rcu_dynticks_task_exit() and rcu_cleanup_after_idle()
* calls need to be on opposite sides of the
* rcu_dynticks_eqs_exit() call, just like they are now.
*/
}

void rcu_nmi_enter(void)
{
rcu_nmi_enter_common(false);
}

void rcu_irq_enter(void)
{
lockdep_assert_irqs_disabled();
rcu_nmi_enter(true);
}

Saving a couple of branches on the irq enter/exit paths seems like it
just might be worth something. ;-)

Thanx, Paul