Re: PROBLEM: 3.0-rc kernels unbootable since -rc3

From: Paul E. McKenney
Date: Tue Jul 12 2011 - 11:23:08 EST


On Tue, Jul 12, 2011 at 08:15:50AM -0700, Paul E. McKenney wrote:
> On Tue, Jul 12, 2011 at 07:49:36AM -0700, Paul E. McKenney wrote:
> > On Tue, Jul 12, 2011 at 10:12:28AM -0400, Konrad Rzeszutek Wilk wrote:
> > > > > [<c042d0f5>] task_waking_fair+0x14 <--
> > > >
> > > > Hmmm... This is a 32-bit system, isn't it?
> > >
> > > Yes. I ran this little loop:
> > >
> > > #!/bin/bash
> > >
> > > ID=`xl list | grep Fedora | awk ' { print $2}'`
> > >
> > > rm -f cpu*.log
> > > while (true) do
> > > xl pause $ID
> > > /usr/lib64/xen/bin/xenctx -s /mnt/tmp/FC15-32/System.map-3.0.0-rc6-julie-tested-dirty -a $ID 0 >> cpu0.log
> > > /usr/lib64/xen/bin/xenctx -s /mnt/tmp/FC15-32/System.map-3.0.0-rc6-julie-tested-dirty -a $ID 1 >> cpu1.log
> > > /usr/lib64/xen/bin/xenctx -s /mnt/tmp/FC15-32/System.map-3.0.0-rc6-julie-tested-dirty -a $ID 2 >> cpu2.log
> > > /usr/lib64/xen/bin/xenctx -s /mnt/tmp/FC15-32/System.map-3.0.0-rc6-julie-tested-dirty -a $ID 3 >> cpu3.log
> > > xl unpause $ID
> > > done
> > >
> > > To get an idea what the CPU is doing before it hits the task_waking_fair
> > > and there isn't anything daming. Here are the logs:
> > >
> > > http://darnok.org/xen/cpu1.log
> >
> > OK, a fair amount of variety, then lots and lots of task_waking_fair(),
> > so I still feel good about asking you for the following.
>
> But... But... But...
>
> Just how accurate are these stack traces? For example, do you have
> frame pointers enabled? If not, could you please enable them?
>
> The reason that I ask is that the wakeme_after_rcu() looks like it is
> being invoked from softirq, which would be grossly illegal and could
> cause any manner of misbehavior. Did someone put a synchronize_rcu()
> into an RCU callback or something? Or did I do something really really
> braindead inside the RCU implementation?
>
> (I am looking into this last question, but would appreciate any and all
> help with the other questions!)

OK, I was confusing Julie's, Ravi's, and Konrad's situations.
The wakeme_after_rcu() is in fact OK to call from sofirq -- if and
only if the scheduler is actually running. This is what happens if
you do a synchronize_rcu() given your CONFIG_TREE_RCU setup -- an RCU
callback is posted that, when invoked, awakens the task that invoked
synchronize_rcu().

And, based on http://darnok.org/xen/log-rcu-stall, Konrad's system
appears to be well past the point where the scheduler is initialized.

So I am coming back around to the loop in task_waking_fair().

Though the patch I sent out earlier might help, for example, if early
invocation of RCU callbacks is somehow messing up the scheduler's
initialization.

Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/