Re: frequent lockups in 3.18rc4

From: Don Zickus
Date: Tue Nov 18 2014 - 16:26:09 EST


On Tue, Nov 18, 2014 at 08:28:01PM +0100, Thomas Gleixner wrote:
> On Tue, 18 Nov 2014, Linus Torvalds wrote:
> > On Tue, Nov 18, 2014 at 6:52 AM, Dave Jones <davej@xxxxxxxxxx> wrote:
> > >
> > > Here's the first hit. Curiously, one cpu is missing.
> >
> > That might be the CPU3 that isn't responding to IPIs due to some bug..
> >
> > > NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [trinity-c180:17837]
> > > RIP: 0010:[<ffffffffa91a0db0>] [<ffffffffa91a0db0>] bad_range+0x0/0x90
> >
> > Hmm. Something looping in the page allocator? Not waiting for a lock,
> > but livelocked? I'm not seeing anything here that should trigger the
> > NMI watchdog at all.
> >
> > Can the NMI watchdog get confused somehow?
>
> That's the soft lockup detector which runs from the timer interrupt
> not from NMI.
>
> > So it does look like CPU3 is the problem, but sadly, CPU3 is
> > apparently not listening, and doesn't even react to the NMI, much less
>
> As I said in the other mail. It gets the NMI and reacts on it. It's
> just mangled into the CPU0 backtrace.

I was going to reply about both points too. :-) Though the mangling looks
odd because we have spin_locks serializing the output for each cpu.

Another thing I wanted to ask DaveJ, did you recently turn on
CONFIG_PREEMPT? That would explain why you are seeing the softlockups
now. If you disable CONFIG_PREEMPT does the softlockups disappear.

Cheers,
Don
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/