Re: BUG: spinlock recursion (sys_chdir, user_path_at,do_path_lookup ...)

From: Uwe Kleine-König
Date: Wed Jan 12 2011 - 02:52:44 EST


Hello,

On Tue, Jan 11, 2011 at 12:05:39PM +0100, Uwe Kleine-König wrote:
> when testing yesterday's Linus' master branch
> (a08948812b30653eb2c536ae613b635a989feb6f + some arch support including
> Trond's latest nfsfix[1]) I hit the following reproducibly:
>
> [ 5.580000] BUG: spinlock recursion on CPU#0, init/1
> [ 5.580000] lock: c7487e10, .magic: dead4ead, .owner: init/1, .owner_cpu: 0
> [ 5.590000] Backtrace:
> [ 5.590000] [<c0037c2c>] (dump_backtrace+0x0/0x110) from [<c028240c>] (dump_stack+0x1c/0x20)
> [ 5.600000] r7:c7487e10 r6:c0321368 r5:c7487e10 r4:c7848000
> [ 5.610000] [<c02823f0>] (dump_stack+0x0/0x20) from [<c01b516c>] (spin_bug+0x90/0xa4)
> [ 5.620000] [<c01b50dc>] (spin_bug+0x0/0xa4) from [<c01b52d4>] (do_raw_spin_lock+0x50/0x154)
> [ 5.620000] r6:c7487e10 r5:c7487e10 r4:00000000
> [ 5.630000] [<c01b5284>] (do_raw_spin_lock+0x0/0x154) from [<c028524c>] (_raw_spin_lock_nested+0x40/0x48)
> [ 5.640000] [<c028520c>] (_raw_spin_lock_nested+0x0/0x48) from [<c00f436c>] (nameidata_dentry_drop_rcu+0x90/0x1a4)
> [ 5.650000] r5:c7843efc r4:c7487dc0
> [ 5.650000] [<c00f42dc>] (nameidata_dentry_drop_rcu+0x0/0x1a4) from [<c00f44c0>] (d_revalidate+0x40/0x68)
> [ 5.660000] [<c00f4480>] (d_revalidate+0x0/0x68) from [<c00f6ed4>] (link_path_walk+0xb84/0xbf0)
> [ 5.670000] r6:c7843efc r5:c7843efc r4:00000000
> [ 5.680000] [<c00f6350>] (link_path_walk+0x0/0xbf0) from [<c00f7164>] (do_path_lookup+0x48/0xd4)
> [ 5.680000] [<c00f711c>] (do_path_lookup+0x0/0xd4) from [<c00f7c08>] (user_path_at+0x64/0x9c)
> [ 5.690000] [<c00f7ba4>] (user_path_at+0x0/0x9c) from [<c00e9614>] (sys_chdir+0x2c/0x78)
> [ 5.700000] r8:c0034108 r7:0000000c r6:be961ee4 r5:c7843f88 r4:00063015
> [ 5.710000] [<c00e95e8>] (sys_chdir+0x0/0x78) from [<c0033e80>] (ret_fast_syscall+0x0/0x44)
> [ 5.720000] r5:be961ee4 r4:00063015
> [ 11.720000] BUG: spinlock lockup on CPU#0, init/1, c7487e10
> [ 11.730000] Backtrace:
> [ 11.730000] [<c0037c2c>] (dump_backtrace+0x0/0x110) from [<c028240c>] (dump_stack+0x1c/0x20)
> [ 11.740000] r7:c7842000 r6:c7487e10 r5:00000000 r4:00000000
> [ 11.740000] [<c02823f0>] (dump_stack+0x0/0x20) from [<c01b539c>] (do_raw_spin_lock+0x118/0x154)
> [ 11.750000] [<c01b5284>] (do_raw_spin_lock+0x0/0x154) from [<c028524c>] (_raw_spin_lock_nested+0x40/0x48)
> [ 11.760000] [<c028520c>] (_raw_spin_lock_nested+0x0/0x48) from [<c00f436c>] (nameidata_dentry_drop_rcu+0x90/0x1a4)
> [ 11.770000] r5:c7843efc r4:c7487dc0
> [ 11.780000] [<c00f42dc>] (nameidata_dentry_drop_rcu+0x0/0x1a4) from [<c00f44c0>] (d_revalidate+0x40/0x68)
> [ 11.790000] [<c00f4480>] (d_revalidate+0x0/0x68) from [<c00f6ed4>] (link_path_walk+0xb84/0xbf0)
> [ 11.790000] r6:c7843efc r5:c7843efc r4:00000000
> [ 11.800000] [<c00f6350>] (link_path_walk+0x0/0xbf0) from [<c00f7164>] (do_path_lookup+0x48/0xd4)
> [ 11.810000] [<c00f711c>] (do_path_lookup+0x0/0xd4) from [<c00f7c08>] (user_path_at+0x64/0x9c)
> [ 11.820000] [<c00f7ba4>] (user_path_at+0x0/0x9c) from [<c00e9614>] (sys_chdir+0x2c/0x78)
> [ 11.820000] r8:c0034108 r7:0000000c r6:be961ee4 r5:c7843f88 r4:00063015
> [ 11.830000] [<c00e95e8>] (sys_chdir+0x0/0x78) from [<c0033e80>] (ret_fast_syscall+0x0/0x44)
> [ 11.840000] r5:be961ee4 r4:00063015
> [ 75.280000] BUG: soft lockup - CPU#0 stuck for 64s! [init:1]
> [ 75.280000] Modules linked in:
> [ 75.280000] irq event stamp: 113662
> [ 75.280000] hardirqs last enabled at (113662): [<c0285a7c>] _raw_spin_unlock_irqrestore+0x48/0x50
> [ 75.280000] hardirqs last disabled at (113661): [<c0285398>] _raw_spin_lock_irqsave+0x30/0x64
> [ 75.280000] softirqs last enabled at (113509): [<c026447c>] rpc_wake_up_next+0x1b0/0x1c4
> [ 75.280000] softirqs last disabled at (113507): [<c02854f0>] _raw_spin_lock_bh+0x20/0x58
> [ 75.280000]
> [ 75.280000] Pid: 1, comm: init
> [ 75.280000] CPU: 0 Not tainted (2.6.37-04021-gb8b018c-dirty #41)
> [ 75.280000] PC is at do_raw_spin_lock+0xac/0x154
> [ 75.280000] LR is at do_raw_spin_lock+0xc0/0x154
> [ 75.280000] pc : [<c01b5330>] lr : [<c01b5344>] psr: 20000013
> [ 75.280000] sp : c7843dd0 ip : c7843cd4 fp : c7843e04
> [ 75.280000] r10: 06bd0000 r9 : 00000000 r8 : 00000000
> [ 75.280000] r7 : c7842000 r6 : c7487e10 r5 : 00000000 r4 : 03dd5aca
> [ 75.280000] r3 : 00000000 r2 : 00000001 r1 : c0285a74 r0 : 00000001
> [ 75.280000] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
> [ 75.280000] Control: 0005317f Table: 479a8000 DAC: 00000015
> [ 75.280000] [<c00356c4>] (show_regs+0x0/0x54) from [<c0089dac>] (watchdog_timer_fn+0x13c/0x1a4)
> [ 75.280000] r4:c7842000
> [ 75.280000] [<c0089c70>] (watchdog_timer_fn+0x0/0x1a4) from [<c006cb58>] (__run_hrtimer+0x114/0x1f0)
> [ 75.280000] [<c006ca44>] (__run_hrtimer+0x0/0x1f0) from [<c006ced8>] (hrtimer_interrupt+0x154/0x338)
> [ 75.280000] [<c006cd84>] (hrtimer_interrupt+0x0/0x338) from [<c003e36c>] (mxs_timer_interrupt+0x28/0x34)
> [ 75.280000] [<c003e344>] (mxs_timer_interrupt+0x0/0x34) from [<c008a408>] (handle_IRQ_event+0x7c/0x1a8)
> [ 75.280000] [<c008a38c>] (handle_IRQ_event+0x0/0x1a8) from [<c008c948>] (handle_level_irq+0xc8/0x148)
> [ 75.280000] [<c008c880>] (handle_level_irq+0x0/0x148) from [<c002d320>] (asm_do_IRQ+0x80/0xa4)
> [ 75.280000] r7:c7842000 r6:c7487e10 r5:00000000 r4:00000030
> [ 75.280000] [<c002d2a0>] (asm_do_IRQ+0x0/0xa4) from [<c0033ab8>] (__irq_svc+0x38/0x80)
> [ 75.280000] Exception stack(0xc7843d88 to 0xc7843dd0)
> [ 75.280000] 3d80: 00000001 c0285a74 00000001 00000000 03dd5aca 00000000
> [ 75.280000] 3da0: c7487e10 c7842000 00000000 00000000 06bd0000 c7843e04 c7843cd4 c7843dd0
> [ 75.280000] 3dc0: c01b5344 c01b5330 20000013 ffffffff
> [ 75.280000] r5:f5000000 r4:ffffffff
> [ 75.280000] [<c01b5284>] (do_raw_spin_lock+0x0/0x154) from [<c028524c>] (_raw_spin_lock_nested+0x40/0x48)
> [ 75.280000] [<c028520c>] (_raw_spin_lock_nested+0x0/0x48) from [<c00f436c>] (nameidata_dentry_drop_rcu+0x90/0x1a4)
> [ 75.280000] r5:c7843efc r4:c7487dc0
> [ 75.280000] [<c00f42dc>] (nameidata_dentry_drop_rcu+0x0/0x1a4) from [<c00f44c0>] (d_revalidate+0x40/0x68)
> [ 75.280000] [<c00f4480>] (d_revalidate+0x0/0x68) from [<c00f6ed4>] (link_path_walk+0xb84/0xbf0)
> [ 75.280000] r6:c7843efc r5:c7843efc r4:00000000
> [ 75.280000] [<c00f6350>] (link_path_walk+0x0/0xbf0) from [<c00f7164>] (do_path_lookup+0x48/0xd4)
> [ 75.280000] [<c00f711c>] (do_path_lookup+0x0/0xd4) from [<c00f7c08>] (user_path_at+0x64/0x9c)
> [ 75.280000] [<c00f7ba4>] (user_path_at+0x0/0x9c) from [<c00e9614>] (sys_chdir+0x2c/0x78)
> [ 75.280000] r8:c0034108 r7:0000000c r6:be961ee4 r5:c7843f88 r4:00063015
> [ 75.280000] [<c00e95e8>] (sys_chdir+0x0/0x78) from [<c0033e80>] (ret_fast_syscall+0x0/0x44)
> [ 75.280000] r5:be961ee4 r4:00063015
>
> I started to bisect, but already the first test case showed a different
> error (my getty dying every few seconds).
I bisected this one now, the first bad commit is

9c0729d (x86: Eliminate bp argument from the stack tracing routines)

. It made a x86 specific change to include/linux/stacktrace.h.

According to tglx the lockup above "is related to nicks scalability
stuff". I havn't researched yet the offending commit. Is that
necessary?

Best regards
Uwe

--
Pengutronix e.K. | Uwe Kleine-König |
Industrial Linux Solutions | http://www.pengutronix.de/ |
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/