Re: [RFC] rcu: Warn that rcu ktheads cannot be spawned

From: Byungchul Park
Date: Tue Jun 25 2019 - 22:52:15 EST


On Tue, Jun 25, 2019 at 06:31:15AM -0700, Paul E. McKenney wrote:
> On Tue, Jun 25, 2019 at 11:41:00AM +0900, Byungchul Park wrote:
> > On Mon, Jun 24, 2019 at 10:25:51AM -0700, Paul E. McKenney wrote:
> > > On Mon, Jun 24, 2019 at 12:46:24PM -0400, Joel Fernandes wrote:
> > > > On Mon, Jun 24, 2019 at 05:27:32PM +0900, Byungchul Park wrote:
> > > > > Hello rcu folks,
> > > > >
> > > > > I thought it'd better to announce it if those spawnings fail because of
> > > > > !rcu_scheduler_fully_active.
> > > > >
> > > > > Of course, with the current code, it never happens though.
> > > > >
> > > > > Thoughts?
> > > >
> > > > It seems in the right spirit, but with your patch a warning always fires.
> > > > rcu_prepare_cpu() is called multiple times, once from rcu_init() and then
> > > > from hotplug paths.
> > > >
> > > > Warning splat stack looks like:
> > > >
> > > > [ 0.398767] Call Trace:
> > > > [ 0.398775] rcu_init+0x6aa/0x724
> > > > [ 0.398779] start_kernel+0x220/0x4a2
> > > > [ 0.398780] ? copy_bootdata+0x12/0xac
> > > > [ 0.398782] secondary_startup_64+0xa4/0xb0
> > >
> > > Thank you both, and I will remove this from my testing queue.
> > >
> > > As Joel says, this is called at various points in the boot sequence, not
> > > all of which are far enough along to support spawning kthreads.
> > >
> > > The real question here is "What types of bugs are we trying to defend
> > > against?" But keeping in mind existing diagnostics. For example, are
> > > there any kthreads for which a persistent failure to spawn would not
> > > emit any error message. My belief is that any such persistent failure
> > > would result in either an in-kernel diagnostic or an rcutorture failure,
> > > but I might well be missing something.
> > >
> > > Thoughts? Or, more to the point, tests demonstrating silence in face
> > > of such a persistent failure?
> >
> > You are right. There wouldn't be a persistent failure because the path
> > turning cpus on always tries to spawn them, *even* in case that the
> > booting sequence is wrong. The current code anyway goes right though.
> >
> > I thought a hole can be there if the code changes so that those kthreads
> > cannot be spawned until the cpu being up, which is the case I was
> > interested in. Again, it's gonna never happen with the current code
> > because it spawns them after setting rcu_scheduler_fully_active to 1 in
> > rcu_spawn_gp_kthead().
> >
> > And I wrongly thought you placed the rcu_scheduler_fully_active check on
> > spawning just in case. But it seems to be not the case.
> >
> > So I'd better stop working on the warning patch. :) Instead, please
> > check the following trivial fix.
> >
> > Thanks,
> > Byungchul
> >
> > ---8<---
> > >From 1293d19bb7abf7553d656c81182118eff54e7dc9 Mon Sep 17 00:00:00 2001
> > From: Byungchul Park <byungchul.park@xxxxxxx>
> > Date: Mon, 24 Jun 2019 16:22:11 +0900
> > Subject: [PATCH] rcu: Make rcu_spawn_one_boost_kthread() return void
> >
> > The return value of rcu_spawn_one_boost_kthread() is not used any
> > longer. Change the return type from int to void.
> >
> > Signed-off-by: Byungchul Park <byungchul.park@xxxxxxx>
>
> Looks pretty good, just one comment below. Plus could you please tell
> me what you are developing this against?

Only for removing out-dated code. Not for funcational change.

And simple answer below...

>
> Thanx, Paul
>
> > ---
> > kernel/rcu/tree_plugin.h | 15 +++++++--------
> > 1 file changed, 7 insertions(+), 8 deletions(-)
> >
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index 1102765..4e11aa4 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -1131,7 +1131,7 @@ static void rcu_preempt_boost_start_gp(struct rcu_node *rnp)
> > * already exist. We only create this kthread for preemptible RCU.
> > * Returns zero if all is well, a negated errno otherwise.
> > */
> > -static int rcu_spawn_one_boost_kthread(struct rcu_node *rnp)
> > +static void rcu_spawn_one_boost_kthread(struct rcu_node *rnp)
> > {
> > int rnp_index = rnp - rcu_get_root();
> > unsigned long flags;
> > @@ -1139,25 +1139,24 @@ static int rcu_spawn_one_boost_kthread(struct rcu_node *rnp)
> > struct task_struct *t;
> >
> > if (!IS_ENABLED(CONFIG_PREEMPT_RCU))
> > - return 0;
> > + return;
> >
> > if (!rcu_scheduler_fully_active || rcu_rnp_online_cpus(rnp) == 0)
> > - return 0;
> > + return;
> >
> > rcu_state.boost = 1;
> > if (rnp->boost_kthread_task != NULL)
> > - return 0;
> > + return;
> > t = kthread_create(rcu_boost_kthread, (void *)rnp,
> > "rcub/%d", rnp_index);
> > if (IS_ERR(t))
>
> This would be a change in behavior, but it might be good to have a
> WARN_ON_ONCE() above. Assuming that it doesn't splat on every boot. ;-)

Yes. Normally it shouldn't. Right? I will resend this after testing with
my machine. :)

Thanks,
Byungchul

>
> > - return PTR_ERR(t);
> > + return;
> > raw_spin_lock_irqsave_rcu_node(rnp, flags);
> > rnp->boost_kthread_task = t;
> > raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > sp.sched_priority = kthread_prio;
> > sched_setscheduler_nocheck(t, SCHED_FIFO, &sp);
> > wake_up_process(t); /* get to TASK_INTERRUPTIBLE quickly. */
> > - return 0;
> > }
> >
> > static void rcu_cpu_kthread_setup(unsigned int cpu)
> > @@ -1265,7 +1264,7 @@ static void __init rcu_spawn_boost_kthreads(void)
> > if (WARN_ONCE(smpboot_register_percpu_thread(&rcu_cpu_thread_spec), "%s: Could not start rcub kthread, OOM is now expected behavior\n", __func__))
> > return;
> > rcu_for_each_leaf_node(rnp)
> > - (void)rcu_spawn_one_boost_kthread(rnp);
> > + rcu_spawn_one_boost_kthread(rnp);
> > }
> >
> > static void rcu_prepare_kthreads(int cpu)
> > @@ -1275,7 +1274,7 @@ static void rcu_prepare_kthreads(int cpu)
> >
> > /* Fire up the incoming CPU's kthread and leaf rcu_node kthread. */
> > if (rcu_scheduler_fully_active)
> > - (void)rcu_spawn_one_boost_kthread(rnp);
> > + rcu_spawn_one_boost_kthread(rnp);
> > }
> >
> > #else /* #ifdef CONFIG_RCU_BOOST */
> > --
> > 1.9.1
> >