Re: [RFC] Deadlock via recursive wakeup via RCU with threadirqs

From: Paul E. McKenney
Date: Fri Jun 28 2019 - 18:25:56 EST


On Fri, Jun 28, 2019 at 05:40:18PM -0400, Joel Fernandes wrote:
> Hi Paul,
>
> On Fri, Jun 28, 2019 at 01:04:23PM -0700, Paul E. McKenney wrote:
> [snip]
> > > > > Commit
> > > > > - 23634ebc1d946 ("rcu: Check for wakeup-safe conditions in
> > > > > rcu_read_unlock_special()") does not trigger the bug within 94
> > > > > attempts.
> > > > >
> > > > > - 48d07c04b4cc1 ("rcu: Enable elimination of Tree-RCU softirq
> > > > > processing") needed 12 attempts to trigger the bug.
> > > >
> > > > That matches my belief that 23634ebc1d946 ("rcu: Check for wakeup-safe
> > > > conditions in rcu_read_unlock_special()") will at least greatly decrease
> > > > the probability of this bug occurring.
> > >
> > > I was just typing a reply that I can't reproduce it with:
> > > rcu: Check for wakeup-safe conditions in rcu_read_unlock_special()
> > >
> > > I am trying to revert enough of this patch to see what would break things,
> > > however I think a better exercise might be to understand more what the patch
> > > does why it fixes things in the first place ;-) It is probably the
> > > deferred_qs thing.
> >
> > The deferred_qs flag is part of it! Looking forward to hearing what
> > you come up with as being the critical piece of this commit.
>
> The new deferred_qs flag indeed saves the machine from the dead-lock.
>
> If we don't want the deferred_qs, then the below patch also fixes the issue.
> However, I am more sure than not that it does not handle all cases (such as
> what if we previously had an expedited grace period IPI in a previous reader
> section and had to to defer processing. Then it seems a similar deadlock
> would present. But anyway, the below patch does fix it for me! It is based on
> your -rcu tree commit 23634ebc1d946f19eb112d4455c1d84948875e31 (rcu: Check
> for wakeup-safe conditions in rcu_read_unlock_special()).

The point here being that you rely on .b.blocked rather than
.b.deferred_qs. Hmmm... There are a number of places that check all
the bits via the .s leg of the rcu_special union. The .s check in
rcu_preempt_need_deferred_qs() should be OK because it is conditioned
on t->rcu_read_lock_nesting of zero or negative.

Do rest of those also work out OK?

It would be nice to remove the flag, but doing so clearly needs careful
review and testing.

Thanx, Paul

> ---8<-----------------------
>
> From: "Joel Fernandes (Google)" <joel@xxxxxxxxxxxxxxxxx>
> Subject: [PATCH] Fix RCU recursive deadlock
>
> Signed-off-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx>
> ---
> include/linux/sched.h | 2 +-
> kernel/rcu/tree_plugin.h | 17 +++++++++++++----
> 2 files changed, 14 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 942a44c1b8eb..347e6dfcc91b 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -565,7 +565,7 @@ union rcu_special {
> u8 blocked;
> u8 need_qs;
> u8 exp_hint; /* Hint for performance. */
> - u8 deferred_qs;
> + u8 pad;
> } b; /* Bits. */
> u32 s; /* Set of bits. */
> };
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 75110ea75d01..5b9b12c1ba5c 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -455,7 +455,6 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
> local_irq_restore(flags);
> return;
> }
> - t->rcu_read_unlock_special.b.deferred_qs = false;
> if (special.b.need_qs) {
> rcu_qs();
> t->rcu_read_unlock_special.b.need_qs = false;
> @@ -608,13 +607,24 @@ static void rcu_read_unlock_special(struct task_struct *t)
> if (preempt_bh_were_disabled || irqs_were_disabled) {
> t->rcu_read_unlock_special.b.exp_hint = false;
> // Need to defer quiescent state until everything is enabled.
> +
> + /* If unlock_special was called in the current reader section
> + * just because we were blocked in a previous reader section,
> + * then raising softirqs can deadlock. This is because the
> + * scheduler executes RCU sections with preemption disabled,
> + * however it may have previously blocked in a previous
> + * non-scheduler reader section and .blocked got set. It is
> + * never safe to call unlock_special from the scheduler path
> + * due to recursive wake ups (unless we are in_irq(), so
> + * prevent this by checking if we were previously blocked.
> + */
> if (irqs_were_disabled && use_softirq &&
> - (in_irq() || !t->rcu_read_unlock_special.b.deferred_qs)) {
> + (!t->rcu_read_unlock_special.b.blocked || in_irq())) {
> // Using softirq, safe to awaken, and we get
> // no help from enabling irqs, unlike bh/preempt.
> raise_softirq_irqoff(RCU_SOFTIRQ);
> } else if (irqs_were_disabled && !use_softirq &&
> - !t->rcu_read_unlock_special.b.deferred_qs) {
> + !t->rcu_read_unlock_special.b.blocked) {
> // Safe to awaken and we get no help from enabling
> // irqs, unlike bh/preempt.
> invoke_rcu_core();
> @@ -623,7 +633,6 @@ static void rcu_read_unlock_special(struct task_struct *t)
> set_tsk_need_resched(current);
> set_preempt_need_resched();
> }
> - t->rcu_read_unlock_special.b.deferred_qs = true;
> local_irq_restore(flags);
> return;
> }
> --
> 2.22.0.410.gd8fdbe21b5-goog
>