Re: spinlock lockup, rcu stalls etc.

From: Paul E. McKenney
Date: Wed Jul 24 2013 - 10:13:52 EST


On Tue, Jul 23, 2013 at 07:24:31PM -0700, Linus Torvalds wrote:
> [ Added Thomas and Ingo due to that timer/nmi thing. ]
>
> On Tue, Jul 23, 2013 at 9:28 AM, Dave Jones <davej@xxxxxxxxxx> wrote:
> > Woke up to a box that I could log into, but would hang as soon as I tried to
> > do any disk IO. This is what hit the logs before that.
> >
> > [28853.503179] hrtimer: interrupt took 4847 ns
>
> Ugh. There's some nasty timer congestion there..
>
> > [28932.599607] BUG: spinlock lockup suspected on CPU#0, trinity-child2/6990
> > [28932.600419] lock: inode_sb_list_lock+0x0/0x80, .magic: dead4ead, .owner: trinity-child1/6763, .owner_cpu: 1
>
> So the current owner of the lock is cpu 1. The other CPU's agree:
>
> > [28932.606666] BUG: spinlock lockup suspected on CPU#2, trinity-child2/6764
> > [28932.606669] lock: inode_sb_list_lock+0x0/0x80, .magic: dead4ead, .owner: trinity-child1/6763, .owner_cpu: 1
> > [28932.617231] sending NMI to all CPUs:
> > [28932.635092] BUG: spinlock lockup suspected on CPU#3, trinity-child3/6975
> > [28932.635095] lock: inode_sb_list_lock+0x0/0x80, .magic: dead4ead, .owner: trinity-child1/6763, .owner_cpu: 1
>
> .. and their backtrace all points to them trying to take that
> spinlock. So that is all consistent.
>
> And here's cpu1, edited down a bit:
>
> > [28932.777623] NMI backtrace for cpu 1
> > [28932.777625] INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 91.230 msecs
>
> Whee. 91 msec? We have something going on there too. irq entry locks?

Not sure what the deal is with the other stuff, but the stacks are being
dumped from NMI handlers, and maybe the console got backed up a bit,
slowing down the dump. -Might- account for the 91msec in an NMI handler.

Thanx, Paul

> > [28932.779440] CPU: 1 PID: 6763 Comm: trinity-child1 Not tainted 3.11.0-rc2+ #54
> > [28932.782283] RIP: 0010:[<ffffffff81710385>] [<ffffffff81710385>] add_preempt_count+0x25/0xf0
> > [28932.797761] Call Trace:
> > [28932.798737] <IRQ>
> > [28932.799715] [<ffffffff81320851>] delay_tsc+0x61/0xe0
> > [28932.800693] [<ffffffff81320789>] __const_udelay+0x29/0x30
> > [28932.801674] [<ffffffff810816d4>] __rcu_read_unlock+0x54/0xa0
> > [28932.802657] [<ffffffff810af531>] cpuacct_account_field+0xf1/0x200
> > [28932.804613] [<ffffffff8109e430>] account_system_time+0xb0/0x1b0
> > [28932.805561] [<ffffffff8109e565>] __vtime_account_system+0x35/0x40
> > [28932.806506] [<ffffffff8109e59d>] vtime_account_system.part.2+0x2d/0x50
> > [28932.807445] [<ffffffff8109e8e5>] vtime_account_irq_enter+0x55/0x80
> > [28932.808365] [<ffffffff8105f42f>] irq_enter+0x4f/0x90
> > [28932.809269] [<ffffffff81716a15>] smp_apic_timer_interrupt+0x35/0x60
> > [28932.810156] [<ffffffff817156af>] apic_timer_interrupt+0x6f/0x80
> > [28932.812756] [<ffffffff8105f53d>] irq_exit+0xcd/0xe0
> > [28932.813618] [<ffffffff81716a25>] smp_apic_timer_interrupt+0x45/0x60
> > [28932.814483] [<ffffffff817156af>] apic_timer_interrupt+0x6f/0x80
> > [28932.815345] <EOI>
> > [28932.816207] [<ffffffff8170c6a0>] ? retint_restore_args+0xe/0xe
> > [28932.817069] [<ffffffff810c1275>] ? lock_acquired+0x105/0x3f0
> > [28932.817924] [<ffffffff811ec8e2>] ? sync_inodes_sb+0x1c2/0x2a0
> > [28932.818767] [<ffffffff8170b62c>] _raw_spin_lock+0x6c/0x80
> > [28932.819616] [<ffffffff811ec8e2>] ? sync_inodes_sb+0x1c2/0x2a0
> > [28932.820468] [<ffffffff811ec8e2>] sync_inodes_sb+0x1c2/0x2a0
> > [28932.821310] [<ffffffff81708c1f>] ? wait_for_completion+0xdf/0x110
> > [28932.823819] [<ffffffff811f2a49>] sync_inodes_one_sb+0x19/0x20
> > [28932.824649] [<ffffffff811c2b12>] iterate_supers+0xb2/0x110
> > [28932.825477] [<ffffffff811f2cb5>] sys_sync+0x35/0x90
> > [28932.826300] [<ffffffff81714c54>] tracesys+0xdd/0xe2
> > [28932.827119] [<ffffffffa0000000>] ? 0xffffffff9fffffff
>
> .. and again, it actually looks like the time is not necessarily spent
> inside the spinlock in sync_inodes_sb(), but in a timer interrupt that
> just happened to go off during that. I wonder if this is the same
> issue that caused that earlier hrtimer long delay.. I'm not
> necessarily seeing 91 msecs worth, but..
>
> You seem to have CONFIG_PROVE_RCU_DELAY enabled, which explains that
> delay_tsc() call in there. I wonder how much things like that make it
> worse. Together with the (crazy bad) back-off in __spin_lock_debug(),
> there might be a *lot* of these delays.
>
> That said, the fact that your machine is dead after all this implies
> that there is something else wrong than just things being very slow.
> But I suspect at least *part* of your problems may be due to these
> kinds of debugging options that make things much much worse.
>
> Linus
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/