Re: [PATCH] SYSVIPC - Fix the ipc structures initialization

From: Manfred Spraul
Date: Thu Nov 13 2008 - 01:10:14 EST


Andrew Morton wrote:
Time is starting to press on this one. Is there something which we can
revert which would fix this bug?
My previous analysis was bogus, let's start from scratch:

1) the initial oops report:
http://bugzilla.kernel.org/show_bug.cgi?id=11796#c0

- lockdep is enabled, the oops is somewhere in __lock_acquire
- the instruction that oopses is

>>> lock incl 0x138(%r12)
R12 is 0x0038004000000000

That could be an debug_atomic_inc() in __lock_acquire. The class pointer in the spinlock_t is not initialized, thus it crashes.
Ingo - is that possible?

2) the latest oops was actually a soft lockup:

It starts with:
[ 400.393024] INFO: trying to register non-static key.
[ 400.397005] the code is fine but needs lockdep annotation.
[ 400.397005] turning off the locking correctness validator.
[ 400.397005] Pid: 4207, comm: sysv_test2 Not tainted 2.6.27-ipc_lock #1
[ 400.397005] Call Trace:
[ 400.397005] [<ffffffff80257055>] static_obj+0x60/0x77
[ 400.397005] [<ffffffff8025af59>] __lock_acquire+0x1c8/0x779
[ 400.397005] [<ffffffff8025b59f>] lock_acquire+0x95/0xc2
[ 400.397005] [<ffffffff802feb07>] ipc_lock+0x62/0x99
[ 400.397005] [<ffffffff8045117d>] _spin_lock+0x2d/0x5a
[ 400.397005] [<ffffffff802feb07>] ipc_lock+0x62/0x99
[ 400.397005] [<ffffffff802feb07>] ipc_lock+0x62/0x99
[ 400.397005] [<ffffffff802feaa5>] ipc_lock+0x0/0x99
[ 400.397005] [<ffffffff802feb46>] ipc_lock_check+0x8/0x53
[ 400.397005] [<ffffffff803002c3>] sys_msgctl+0x188/0x461
[ 400.397005] [<ffffffff80259ac7>] trace_hardirqs_on_caller+0x100/0x12a
[ 400.397005] [<ffffffff80450d49>] trace_hardirqs_on_thunk+0x3a/0x3f
[ 400.397005] [<ffffffff80259ac7>] trace_hardirqs_on_caller+0x100/0x12a
[ 400.397005] [<ffffffff80212e09>] sched_clock+0x5/0x7
[ 400.397005] [<ffffffff80450d49>] trace_hardirqs_on_thunk+0x3a/0x3f
[ 400.397005] [<ffffffff80213021>] native_sched_clock+0x8c/0xa5
[ 400.397005] [<ffffffff80212e09>] sched_clock+0x5/0x7
[ 400.397005] [<ffffffff8020bf7a>] system_call_fastpath+0x16/0x1b
[ 400.397005]
[ 464.933003] BUG: soft lockup - CPU#2 stuck for 61s! [sysv_test2:4207]
[ 464.933006] Call Trace:
[ 464.933006] [<ffffffff8033dc6b>] _raw_spin_lock+0x98/0x100
[ 464.933006] [<ffffffff8045119e>] _spin_lock+0x4e/0x5a
[ 464.933006] [<ffffffff802feb07>] ipc_lock+0x62/0x99

For me, it reads like an uninitialized spinlock_t:
The static_obj test in kernel/lockdep.c notices that something is wrong and disables itself.
But then _raw_spin_lock() tries to acquire the uninitialized spinlock and loops forever, because noone does spin_unlock().
after 60 seconds, the soft lockup detection notices the problem and oopses.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/