Re: call_rcu data race patch

From: Guillaume Morin
Date: Mon Sep 27 2021 - 12:49:49 EST


On 27 Sep 9:10, Paul E. McKenney wrote:
> Very good! Would you be willing to give me your Tested-by?

Of course! Added below after the patch.

> > The first patch is already in your rcu tree and my gut feeling is that
> > it is the one that fixes the issue but you're the expert here... Though
> > I think it should be probably fast tracked and marked for stable?
> >
> > Are you planning on committing the 2nd patch to your tree?
>
> This is the second patch, correct? (Too many patches!)

Correct. And to be 100% clear, the first one is
https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/commit/?id=2431774f04d1050292054c763070021bade7b151

> If so, I add your Tested-by and fill out the commit log. It would be
> slated for the v5.17 merge window by default, that is, not the upcoming
> merge window but the one after that. Please let me know if you need
> it sooner.

I personally don't need it sooner. But it's been broken for a while (5.4
based on the bugzilla report) and I can't imagine the original reporter
and we are the only ones hitting this. So my personal opinion would be
to get both patches into Linus's tree and stable branches asap, but
obviously this is entirely up to you.

I do appreciate the help!

> ------------------------------------------------------------------------
>
> commit 1a792b59071b697defd4ccdc8b951cce49de9d2f
> Author: Paul E. McKenney <paulmck@xxxxxxxxxx>
> Date: Fri Sep 17 15:04:48 2021 -0700
>
> EXP rcu: Tighten rcu_advance_cbs_nowake() checks
>
> This is an experimental shot-in-the-dark debugging patch.
>
> Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx>
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 6a1e9d3374db..6d692a591f66 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1590,10 +1590,14 @@ static void __maybe_unused rcu_advance_cbs_nowake(struct rcu_node *rnp,
> struct rcu_data *rdp)
> {
> rcu_lockdep_assert_cblist_protected(rdp);
> - if (!rcu_seq_state(rcu_seq_current(&rnp->gp_seq)) ||
> + // Don't do anything unless the current grace period is guaranteed
> + // not to end. This means a grace period in progress and at least
> + // one holdout CPU.
> + if (!rcu_seq_state(rcu_seq_current(&rnp->gp_seq)) || !READ_ONCE(rnp->qsmask) ||
> !raw_spin_trylock_rcu_node(rnp))
> return;
> - WARN_ON_ONCE(rcu_advance_cbs(rnp, rdp));
> + if (rcu_seq_state(rcu_seq_current(&rnp->gp_seq)) && READ_ONCE(rnp->qsmask))
> + WARN_ON_ONCE(rcu_advance_cbs(rnp, rdp));
> raw_spin_unlock_rcu_node(rnp);
> }
>

Tested-By: Guillaume Morin <guillaume@xxxxxxxxxxx>

--
Guillaume Morin <guillaume@xxxxxxxxxxx>