Re: [locking/ww_mutex] 2a0c112828 WARNING: CPU: 0 PID: 18 at kernel/locking/mutex.c:305 __ww_mutex_wakeup_for_backoff

From: Boqun Feng
Date: Wed Mar 01 2017 - 10:03:07 EST


On Mon, Feb 27, 2017 at 11:35:43AM +0100, Peter Zijlstra wrote:
> On Mon, Feb 27, 2017 at 11:28:24AM +0100, Peter Zijlstra wrote:
> > On Mon, Feb 27, 2017 at 01:14:09PM +0800, Fengguang Wu wrote:
> > > Hello,
> > >
> > > This bisect result is not satisfactory, however the bug is very
> > > reproducible and looks still alive in mainline&linux-next. You may
> > > try the attached reproduce-* script to debug it.
> >
> > OK, let me try that, however, see below.
>
> What that reproduces does is:
>
>
> [ 17.169056] =====================================
> [ 17.171014] [ BUG: bad unlock balance detected! ]
> [ 17.172115] 4.10.0-10265-ge5d56ef #18 Not tainted
> [ 17.172115] -------------------------------------
> [ 17.172115] kworker/u2:0/5 is trying to release lock (ww_class_mutex) at:
> [ 17.172115] [<41575474>] ww_mutex_unlock+0x66/0x72
> [ 17.172115] but there are no more locks to release!
> [ 17.172115]
> [ 17.172115] other info that might help us debug this:
> [ 17.172115]
> [ 17.172115] other info that might help us debug this:
> [ 17.172115] 4 locks held by kworker/u2:0/5:
> [ 17.172115] #0: ("test-ww_mutex"){......}, at: [<41041def>] process_one_work+0x168/0x33a
> [ 17.172115] #1: ((&stress->work)){......}, at: [<41041def>] process_one_work+0x168/0x33a
> [ 17.172115] #2: (ww_class_acquire){......}, at: [<41041e47>] process_one_work+0x1c0/0x33a
> [ 17.172115] #3: (ww_class_mutex){......}, at: [<41057d98>] stress_inorder_work+0xbf/0x218
> [ 17.172115]
> [ 17.172115] stack backtrace:
> [ 17.172115]
> [ 17.172115] stack backtrace:
> [ 17.172115] CPU: 0 PID: 5 Comm: kworker/u2:0 Not tainted 4.10.0-10265-ge5d56ef #18
> [ 17.172115] Workqueue: test-ww_mutex stress_inorder_work
> [ 17.172115] Call Trace:
> [ 17.172115] dump_stack+0x16/0x18
> [ 17.172115] print_unlock_imbalance_bug+0xb4/0xc1
> [ 17.172115] ? ww_mutex_unlock+0x66/0x72
> [ 17.172115] lock_release+0x28d/0x2bb
> [ 17.172115] ? ww_mutex_unlock+0x66/0x72
> [ 17.172115] __mutex_unlock_slowpath+0x2d/0x1db
> [ 17.172115] ? schedule_hrtimeout_range+0xd/0xf
> [ 17.172115] mutex_unlock+0xb/0xd
> [ 17.172115] ww_mutex_unlock+0x66/0x72
> [ 17.172115] stress_inorder_work+0x10a/0x218
> [ 17.172115] process_one_work+0x1c0/0x33a
> [ 17.172115] ? process_one_work+0x168/0x33a
> [ 17.172115] worker_thread+0x22f/0x315
> [ 17.172115] kthread+0xed/0xf2
> [ 17.172115] ? process_scheduled_works+0x24/0x24
> [ 17.172115] ? __kthread_create_on_node+0x11f/0x11f
> [ 17.172115] ret_from_fork+0x21/0x30
>
> Which is an entirely different error.. Lemme look into that.

In test-ww_mutex, the stress will use 4096 locks to do the test, and
given held_lock::references only have 12 bits, so I think this is the
reason? I made a patch to reduce the lock number of stress test, and it
seems the problem is resolved.

Thoughts?

Regards,
Boqun

-------------------->8
From: Boqun Feng <boqun.feng@xxxxxxxxx>
Date: Wed, 1 Mar 2017 22:50:46 +0800
Subject: [PATCH] locking/ww_mutex: Adjust the lock number for stress test

Because there are only 12 bits in held_lock::references, so we only
support 4095 nested lock held in the same time, adjust the lock number
for ww_mutex stress test to kill one lockdep splat:

[ 17.169056] =====================================
[ 17.171014] [ BUG: bad unlock balance detected! ]
[ 17.172115] 4.10.0-10265-ge5d56ef #18 Not tainted
[ 17.172115] -------------------------------------
[ 17.172115] kworker/u2:0/5 is trying to release lock (ww_class_mutex) at:
[ 17.172115] [<41575474>] ww_mutex_unlock+0x66/0x72
[ 17.172115] but there are no more locks to release!
[ 17.172115]
[ 17.172115] other info that might help us debug this:
[ 17.172115]
[ 17.172115] other info that might help us debug this:
[ 17.172115] 4 locks held by kworker/u2:0/5:
[ 17.172115] #0: ("test-ww_mutex"){......}, at: [<41041def>] process_one_work+0x168/0x33a
[ 17.172115] #1: ((&stress->work)){......}, at: [<41041def>] process_one_work+0x168/0x33a
[ 17.172115] #2: (ww_class_acquire){......}, at: [<41041e47>] process_one_work+0x1c0/0x33a
[ 17.172115] #3: (ww_class_mutex){......}, at: [<41057d98>] stress_inorder_work+0xbf/0x218
[ 17.172115]
[ 17.172115] stack backtrace:
[ 17.172115]
[ 17.172115] stack backtrace:
[ 17.172115] CPU: 0 PID: 5 Comm: kworker/u2:0 Not tainted 4.10.0-10265-ge5d56ef #18
[ 17.172115] Workqueue: test-ww_mutex stress_inorder_work
[ 17.172115] Call Trace:
[ 17.172115] dump_stack+0x16/0x18
[ 17.172115] print_unlock_imbalance_bug+0xb4/0xc1
[ 17.172115] ? ww_mutex_unlock+0x66/0x72
[ 17.172115] lock_release+0x28d/0x2bb
[ 17.172115] ? ww_mutex_unlock+0x66/0x72
[ 17.172115] __mutex_unlock_slowpath+0x2d/0x1db
[ 17.172115] ? schedule_hrtimeout_range+0xd/0xf
[ 17.172115] mutex_unlock+0xb/0xd
[ 17.172115] ww_mutex_unlock+0x66/0x72
[ 17.172115] stress_inorder_work+0x10a/0x218
[ 17.172115] process_one_work+0x1c0/0x33a
[ 17.172115] ? process_one_work+0x168/0x33a
[ 17.172115] worker_thread+0x22f/0x315
[ 17.172115] kthread+0xed/0xf2
[ 17.172115] ? process_scheduled_works+0x24/0x24
[ 17.172115] ? __kthread_create_on_node+0x11f/0x11f
[ 17.172115] ret_from_fork+0x21/0x30

Signed-off-by: Boqun Feng <boqun.feng@xxxxxxxxx>
---
kernel/locking/test-ww_mutex.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/locking/test-ww_mutex.c b/kernel/locking/test-ww_mutex.c
index da6c9a34f62f..9155561cca7b 100644
--- a/kernel/locking/test-ww_mutex.c
+++ b/kernel/locking/test-ww_mutex.c
@@ -627,7 +627,7 @@ static int __init test_ww_mutex_init(void)
if (ret)
return ret;

- ret = stress(4096, hweight32(STRESS_ALL)*ncpus, 1<<12, STRESS_ALL);
+ ret = stress(4095, hweight32(STRESS_ALL)*ncpus, 1<<12, STRESS_ALL);
if (ret)
return ret;

--
2.11.0