Re: RCU used on incoming CPU before rcu_cpu_starting() called

From: Paul E. McKenney
Date: Thu Mar 09 2017 - 10:51:20 EST


On Thu, Mar 09, 2017 at 07:29:26AM -0800, Paul E. McKenney wrote:
> On Thu, Mar 09, 2017 at 04:12:55PM +0100, Peter Zijlstra wrote:
> > On Thu, Mar 09, 2017 at 02:08:23PM +0100, Thomas Gleixner wrote:
> > > On Wed, 8 Mar 2017, Paul E. McKenney wrote:
> > > > [ 30.694013] lockdep_rcu_suspicious+0xe7/0x120
> > > > [ 30.694013] get_work_pool+0x82/0x90
> > > > [ 30.694013] __queue_work+0x70/0x5f0
> > > > [ 30.694013] queue_work_on+0x33/0x70
> > > > [ 30.694013] clear_sched_clock_stable+0x33/0x40
> > > > [ 30.694013] early_init_intel+0xe7/0x2f0
> > > > [ 30.694013] init_intel+0x11/0x350
> > > > [ 30.694013] identify_cpu+0x344/0x5a0
> > > > [ 30.694013] identify_secondary_cpu+0x18/0x80
> > > > [ 30.694013] smp_store_cpu_info+0x39/0x40
> > > > [ 30.694013] start_secondary+0x4e/0x100
> > > > [ 30.694013] start_cpu+0x14/0x14
> > > >
> > > > Here is the relevant code from x86's smp_callin():
> > > >
> > > > /*
> > > > * Save our processor parameters. Note: this information
> > > > * is needed for clock calibration.
> > > > */
> > > > smp_store_cpu_info(cpuid);
> > > >
> > > > The problem is that smp_store_cpu_info() indirectly invokes
> > > > schedule_work(), which wants to use RCU. But RCU isn't informed
> > > > of the incoming CPU until the call to notify_cpu_starting(), which
> > > > causes lockdep to complain bitterly about the use of RCU by the
> > > > premature call to schedule_work().
> > >
> > > Right. And that want's to be fixed, not hacked around by silencing RCU.
> > >
> > > Peter????
> >
> > I'm thinking this is hotplug? 30 seconds after boot is far too late for
> > SMP bringup, or you have a stupid slow machine.
>
> And this certainly does qualify as "shortly", thank you!
>
> Yes, this only happens on hotplug with lockdep enabled, specifically
> on rcutorture scenarios TASKS01 and TREE05.
>
> > Because it only calls schedule_work() after SMP-init. In which case
> > there's then two cases, either:
> >
> > - TSC was stable, hotplug wrecked it, TSC is now unstable, and we're
> > screwed.
> >
> > - TSC was unstable, hotplug triggers and we want to mark it unstable
> > _again_.
> >
> > If this is the second, the below should fix it, if its the first, I've
> > no idea yet on how to fix that properly :/
>
> I have applied this patch and started tests on TREE05 and TASKS01, should
> get results shortly.

And the below patch passed light rcutorture testing, so looking good!

Thanx, Paul

> > Bloody hotplug..
> >
> > ---
> > kernel/sched/clock.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
> > index a08795e..eecf388 100644
> > --- a/kernel/sched/clock.c
> > +++ b/kernel/sched/clock.c
> > @@ -172,7 +172,7 @@ void clear_sched_clock_stable(void)
> >
> > smp_mb(); /* matches sched_clock_init_late() */
> >
> > - if (sched_clock_running == 2)
> > + if (sched_clock_running == 2 && sched_clock_stable())
> > schedule_work(&sched_clock_work);
> > }
> >
> >