Re: call_rcu data race patch

From: Paul E. McKenney
Date: Mon Sep 27 2021 - 17:47:13 EST


On Mon, Sep 27, 2021 at 06:49:45PM +0200, Guillaume Morin wrote:
> On 27 Sep 9:10, Paul E. McKenney wrote:
> > Very good! Would you be willing to give me your Tested-by?
>
> Of course! Added below after the patch.
>
> > > The first patch is already in your rcu tree and my gut feeling is that
> > > it is the one that fixes the issue but you're the expert here... Though
> > > I think it should be probably fast tracked and marked for stable?
> > >
> > > Are you planning on committing the 2nd patch to your tree?
> >
> > This is the second patch, correct? (Too many patches!)
>
> Correct. And to be 100% clear, the first one is
> https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/commit/?id=2431774f04d1050292054c763070021bade7b151
>
> > If so, I add your Tested-by and fill out the commit log. It would be
> > slated for the v5.17 merge window by default, that is, not the upcoming
> > merge window but the one after that. Please let me know if you need
> > it sooner.
>
> I personally don't need it sooner. But it's been broken for a while (5.4
> based on the bugzilla report) and I can't imagine the original reporter
> and we are the only ones hitting this. So my personal opinion would be
> to get both patches into Linus's tree and stable branches asap, but
> obviously this is entirely up to you.

I have it in -next with your Tested-by (thank you!), so let's see how
testing and review go.

> I do appreciate the help!

And thank you for giving that patch a go!

Thanx, Paul

> > ------------------------------------------------------------------------
> >
> > commit 1a792b59071b697defd4ccdc8b951cce49de9d2f
> > Author: Paul E. McKenney <paulmck@xxxxxxxxxx>
> > Date: Fri Sep 17 15:04:48 2021 -0700
> >
> > EXP rcu: Tighten rcu_advance_cbs_nowake() checks
> >
> > This is an experimental shot-in-the-dark debugging patch.
> >
> > Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx>
> >
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 6a1e9d3374db..6d692a591f66 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -1590,10 +1590,14 @@ static void __maybe_unused rcu_advance_cbs_nowake(struct rcu_node *rnp,
> > struct rcu_data *rdp)
> > {
> > rcu_lockdep_assert_cblist_protected(rdp);
> > - if (!rcu_seq_state(rcu_seq_current(&rnp->gp_seq)) ||
> > + // Don't do anything unless the current grace period is guaranteed
> > + // not to end. This means a grace period in progress and at least
> > + // one holdout CPU.
> > + if (!rcu_seq_state(rcu_seq_current(&rnp->gp_seq)) || !READ_ONCE(rnp->qsmask) ||
> > !raw_spin_trylock_rcu_node(rnp))
> > return;
> > - WARN_ON_ONCE(rcu_advance_cbs(rnp, rdp));
> > + if (rcu_seq_state(rcu_seq_current(&rnp->gp_seq)) && READ_ONCE(rnp->qsmask))
> > + WARN_ON_ONCE(rcu_advance_cbs(rnp, rdp));
> > raw_spin_unlock_rcu_node(rnp);
> > }
> >
>
> Tested-By: Guillaume Morin <guillaume@xxxxxxxxxxx>
>
> --
> Guillaume Morin <guillaume@xxxxxxxxxxx>