lockdep question (was Re: IPoIB caused a kernel: BUG: soft lockup detected on CPU#0!)

From: Michael S. Tsirkin
Date: Sun Mar 11 2007 - 10:17:13 EST


> Quoting Roland Dreier <roland.list@xxxxxxxxx>:
> Subject: Re: IPoIB caused a kernel: BUG: soft lockup detected on CPU#0!
>
> >Feb 27 17:47:52 sw169 kernel: [<ffffffff8053aaf1>] _spin_lock_irqsave+0x15/0x24
> >Feb 27 17:47:52 sw169 kernel: [<ffffffff88067a23>] :ib_ipoib:ipoib_neigh_destructor+0xc2/0x139
>
> It looks like this is deadlocking trying to take priv->lock in ipoib_neigh_destructor().
> One idea I just had would be to build a kernel with CONFIG_PROVE_LOCKING
> turned on, and then rerun this test. There's a good chance that this would
> diagnose the deadlock. (I don't have good access to my test machines right now, or
> else I would do it myself)

OK, I did that. But I get
[13440.761857] INFO: trying to register non-static key.
[13440.766903] the code is fine but needs lockdep annotation.
[13440.772455] turning off the locking correctness validator.
and I am not sure what triggers this, or how to fix it to have the
validator actually do its job.

Ingo, what key does the message refer to?

The stack dump seems to point to drivers/infiniband/ulp/ipoib/ipoib_main.c line
829.

Full message below:

[13440.761857] INFO: trying to register non-static key.
[13440.766903] the code is fine but needs lockdep annotation.
[13440.772455] turning off the locking correctness validator.
[13440.778008] [<c023c082>] __lock_acquire+0xae4/0xbb9
[13440.783078] [<c023c43d>] lock_acquire+0x56/0x71
[13440.787784] [<f899bff2>] ipoib_neigh_destructor+0xd0/0x132 [ib_ipoib]
[13440.794412] [<c051ad41>] _spin_lock_irqsave+0x32/0x41
[13440.799649] [<f899bff2>] ipoib_neigh_destructor+0xd0/0x132 [ib_ipoib]
[13440.806275] [<f899bff2>] ipoib_neigh_destructor+0xd0/0x132 [ib_ipoib]
[13440.812897] [<c04a1c1b>] dst_run_gc+0xc/0x118
[13440.817439] [<c022af6e>] run_timer_softirq+0x37/0x16b
[13440.822673] [<c04a1c0f>] dst_run_gc+0x0/0x118
[13440.827221] [<c04a3eab>] neigh_destroy+0xbe/0x104
[13440.832114] [<c04a1bb1>] dst_destroy+0x4d/0xab
[13440.836751] [<c04a1c64>] dst_run_gc+0x55/0x118
[13440.841384] [<c022b03f>] run_timer_softirq+0x108/0x16b
[13440.846711] [<c0227634>] __do_softirq+0x5a/0xd5
[13440.851427] [<c023b435>] trace_hardirqs_on+0x106/0x141
[13440.856754] [<c0227643>] __do_softirq+0x69/0xd5
[13440.861470] [<c02276e6>] do_softirq+0x37/0x4d
[13440.866016] [<c02167b0>] smp_apic_timer_interrupt+0x6b/0x77
[13440.871774] [<c02029ef>] default_idle+0x3b/0x54
[13440.876491] [<c02029ef>] default_idle+0x3b/0x54
[13440.881211] [<c0204c33>] apic_timer_interrupt+0x33/0x38
[13440.886624] [<c02029ef>] default_idle+0x3b/0x54
[13440.891342] [<c02029f1>] default_idle+0x3d/0x54
[13440.896061] [<c0202aaa>] cpu_idle+0xa2/0xbb
[13440.900436] =======================
[13768.711447] BUG: spinlock lockup on CPU#1, swapper/0, c0687880
[13768.717353] [<c031f919>] _raw_spin_lock+0xda/0xfd
[13768.722247] [<c051ad48>] _spin_lock_irqsave+0x39/0x41
[13768.727486] [<f899bff2>] ipoib_neigh_destructor+0xd0/0x132 [ib_ipoib]
[13768.734110] [<f899bff2>] ipoib_neigh_destructor+0xd0/0x132 [ib_ipoib]
[13768.740735] [<c04a1c1b>] dst_run_gc+0xc/0x118
[13768.745276] [<c022af6e>] run_timer_softirq+0x37/0x16b
[13768.750517] [<c04a1c0f>] dst_run_gc+0x0/0x118
[13768.755061] [<c04a3eab>] neigh_destroy+0xbe/0x104
[13768.759955] [<c04a1bb1>] dst_destroy+0x4d/0xab
[13768.764586] [<c04a1c64>] dst_run_gc+0x55/0x118
[13768.769218] [<c022b03f>] run_timer_softirq+0x108/0x16b
[13768.774542] [<c0227634>] __do_softirq+0x5a/0xd5
[13768.779261] [<c023b435>] trace_hardirqs_on+0x106/0x141
[13768.784588] [<c0227643>] __do_softirq+0x69/0xd5
[13768.789308] [<c02276e6>] do_softirq+0x37/0x4d
[13768.793851] [<c02167b0>] smp_apic_timer_interrupt+0x6b/0x77
[13768.799609] [<c02029ef>] default_idle+0x3b/0x54
[13768.804326] [<c02029ef>] default_idle+0x3b/0x54
[13768.809054] [<c0204c33>] apic_timer_interrupt+0x33/0x38
[13768.814471] [<c02029ef>] default_idle+0x3b/0x54
[13768.819187] [<c02029f1>] default_idle+0x3d/0x54
[13768.823903] [<c0202aaa>] cpu_idle+0xa2/0xbb
[13768.828279] =======================


--
MST
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/