Re: [patch 59/66] rcu: Convert rcutree to hotplug state machine

From: Paul E. McKenney
Date: Fri Aug 19 2016 - 17:15:02 EST


On Fri, Aug 19, 2016 at 10:12:00PM +0200, Sebastian Andrzej Siewior wrote:
> On 2016-08-18 11:30:13 [-0700], Paul E. McKenney wrote:
> > > > --- a/kernel/cpu.c
> > > > +++ b/kernel/cpu.c
> > > > @@ -882,6 +882,7 @@ void notify_cpu_starting(unsigned int cpu)
> > > > struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
> > > > enum cpuhp_state target = min((int)st->target, CPUHP_AP_ONLINE);
> > > >
> > > > + rcu_cpu_starting(cpu); /* All CPU_STARTING notifiers can use RCU. */
> > > > while (st->state < target) {
> > > > struct cpuhp_step *step;
> > >
> > > What happens if something post CPUHP_AP_ONLINE fails and we do a
> > > rollback to 0? Do we need to revert / undo rcu_cpu_starting() doing?
> >
> > Yes, by calling rcu_cleanup_dying_idle_cpu().
>
> Thank you for that :)

No problem! ;-)

> > But please note that rcu_cpu_starting() is invoked from the CPU itself
> > during the onlining process. Is it really possible to fail beyond
> > that point?
>
> "sure". We enter here at CPUHP_BRINGUP_CPU. The next states until
> (approx) CPUHP_AP_ONLINE are currently defined as "can't fail". As you
> see in notify_cpu_starting() the return code of cpuhp_invoke_callback()
> isn't checked and there is no rollback.
> Later, CPUHP_AP_SMPBOOT_THREADS could fail (not in current but from here
> on we no longer the return code). At this point we would return to where
> we started (CPUHP_OFFLINE is most cases). During the rollback we get to
> rcu_cleanup_dying_idle_cpu() via rcu_report_dead() so that is fine.

OK, so any later failure looks to RCU like the CPU came online, then
immediately went offline. Works for me!

> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > index 3121242b8579..5e7c1d6a6108 100644
> > > > --- a/kernel/rcu/tree.c
> > > > +++ b/kernel/rcu/tree.c
> > > > @@ -3793,8 +3793,6 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp)
> > > > rnp = rdp->mynode;
> > > > mask = rdp->grpmask;
> > > > raw_spin_lock_rcu_node(rnp); /* irqs already disabled. */
> > > > - rnp->qsmaskinitnext |= mask;
> > > > - rnp->expmaskinitnext |= mask;
> > > > if (!rdp->beenonline)
> > > > WRITE_ONCE(rsp->ncpus, READ_ONCE(rsp->ncpus) + 1);
> > > > rdp->beenonline = true; /* We have now been online. */
> > > > @@ -4211,8 +4235,10 @@ void __init rcu_init(void)
> > > > */
> > > > cpu_notifier(rcu_cpu_notify, 0);
> > > > pm_notifier(rcu_pm_notify, 0);
> > > > - for_each_online_cpu(cpu)
> > > > + for_each_online_cpu(cpu) {
> > > > rcu_cpu_notify(NULL, CPU_UP_PREPARE, (void *)(long)cpu);
> > > > + rcu_cpu_starting(cpu);
> > > > + }
> > >
> > > and looking at this from x86-64 then at this point I have CPU0 online
> > > and all other are down (or not yet up). So this is invoked for one CPU
> > > only. And later via hotplug callbacks for the other CPUs.
> >
> > Yes, that is the current situation. The reason for the loop is in
> > case someone clever decides to online other CPUs before RCU initializes
> > itself. No idea how anyone would make that sort of thing work, but I
> > have learned not to underestimate the creativity of the fast-boot guys.
>
> doubt this would work. start_kernel() is invoked from the boot CPU.
> Other CPUs are usually down because the scheduler wasn't up yet (so they
> can't idle in their idle thread nor could they be marked "online" in the
> CPU mask). Later (rest_init() -> kernel_init() ->
> kernel_init_freeable()) there is smp_init() which boots the additional
> CPUs.

Just as there is currently an early boot implementation of printk(),
some crazy person might well create an early boot implementation of the
scheduler that had fixed per-CPU tasks for which blocking is forbidden.
I understand that this is crazy, but much else has been considered crazy
not long before inclusion. PREEMPT_RT, for but one example. ;-)

> > And please do not invoke rcu_cpu_starting() before rcu_init(), at least
> > not without some careful inspection and likely reworking of rcu_init()
> > and the things that it calls.
>
> I did not intend to. I was thinking about moving it to
> CPUHP_AP_RCUTREE_DYING but since the teardown method does not match it
> makes no sense.

Whew!!! ;-)

> Thank you for clarifying the counterpart of rcu_cpu_starting().

Again, no problem!

Thanx, Paul