Re: [PATCH v3 1/3] lockdep: Make LOCKDEP_CROSSRELEASE configs all part of PROVE_LOCKING

From: Peter Zijlstra
Date: Mon Aug 21 2017 - 11:46:12 EST




Booting the very latest -tip on my test machine gets me the below splat.

Dave, TJ, FYI, lockdep grew annotations for completions; it remembers
which locks were taken before we complete() and checks none of those are
held while we wait_for_completion().

That is, something like:

mutex_lock(&A);
mutex_unlock(&A);
complete(&C);

mutex_lock(&A);
wait_for_completion(&C);

Is now considered a problem. Note that in order for C to link to A it
needs to have observed the complete() thread acquiring it after a
wait_for_completion(), something like:

wait_for_completion(&C)
mutex_lock(&A);
mutex_unlock(&A);
complete(&C);

That is, only locks observed taken between C's wait_for_completion() and
it's complete() are considered.

Now given the above observance rule and the fact that the below report
is from the complete, the thing that happened appears to be:


lockdep_map_acquire(&work->lockdep_map)
down_write(&A)

down_write(&A)
wait_for_completion(&C)

lockdep_map_acquire(&work_lockdep_map);
complete(&C)

Which lockdep then puked over because both sides saw the same work
class.

Byungchul; should we not exclude the work class itself, it appears to me
the workqueue code is explicitly parallel, or am I confused again?


[ 38.882358] ======================================================
[ 38.889256] WARNING: possible circular locking dependency detected
[ 38.896155] 4.13.0-rc6-00609-g645737361ab6-dirty #3 Not tainted
[ 38.902752] ------------------------------------------------------
[ 38.909650] kworker/0:0/3 is trying to acquire lock:
[ 38.915189] ((&ioend->io_work)){+.+.}, at: [<ffffffff8110ed5f>] process_one_work+0x1ef/0x6b0
[ 38.924715]
[ 38.924715] but now in release context of a crosslock acquired at the following:
[ 38.934618] ((complete)&bp->b_iowait){+.+.}, at: [<ffffffff8151ce42>] xfs_buf_submit_wait+0xb2/0x590
[ 38.944919]
[ 38.944919] which lock already depends on the new lock.
[ 38.944919]
[ 38.954046]
[ 38.954046] the existing dependency chain (in reverse order) is:
[ 38.962397]
[ 38.962397] -> #2 ((complete)&bp->b_iowait){+.+.}:
[ 38.969401] __lock_acquire+0x10a1/0x1100
[ 38.974460] lock_acquire+0xea/0x1f0
[ 38.979036] wait_for_completion+0x3b/0x130
[ 38.984285] xfs_buf_submit_wait+0xb2/0x590
[ 38.989535] _xfs_buf_read+0x23/0x30
[ 38.994108] xfs_buf_read_map+0x14f/0x320
[ 38.999169] xfs_trans_read_buf_map+0x170/0x5d0
[ 39.004812] xfs_read_agf+0x97/0x1d0
[ 39.009386] xfs_alloc_read_agf+0x60/0x240
[ 39.014541] xfs_alloc_fix_freelist+0x32a/0x3d0
[ 39.020180] xfs_free_extent_fix_freelist+0x6b/0xa0
[ 39.026206] xfs_free_extent+0x48/0x120
[ 39.031068] xfs_trans_free_extent+0x57/0x200
[ 39.036512] xfs_extent_free_finish_item+0x26/0x40
[ 39.042442] xfs_defer_finish+0x174/0x770
[ 39.047494] xfs_itruncate_extents+0x126/0x470
[ 39.053035] xfs_setattr_size+0x2a1/0x310
[ 39.058093] xfs_vn_setattr_size+0x57/0x160
[ 39.063342] xfs_vn_setattr+0x50/0x70
[ 39.068015] notify_change+0x2ee/0x420
[ 39.072785] do_truncate+0x5d/0x90
[ 39.077165] path_openat+0xab2/0xc00
[ 39.081737] do_filp_open+0x8a/0xf0
[ 39.086213] do_sys_open+0x123/0x200
[ 39.090787] SyS_open+0x1e/0x20
[ 39.094876] entry_SYSCALL_64_fastpath+0x23/0xc2
[ 39.100611]
[ 39.100611] -> #1 (&xfs_nondir_ilock_class){++++}:
[ 39.107612] __lock_acquire+0x10a1/0x1100
[ 39.112660] lock_acquire+0xea/0x1f0
[ 39.117224] down_write_nested+0x26/0x60
[ 39.122184] xfs_ilock+0x166/0x220
[ 39.126563] __xfs_setfilesize+0x30/0x200
[ 39.131612] xfs_setfilesize_ioend+0x7f/0xb0
[ 39.136958] xfs_end_io+0x49/0xf0
[ 39.141240] process_one_work+0x273/0x6b0
[ 39.146288] worker_thread+0x48/0x3f0
[ 39.150960] kthread+0x147/0x180
[ 39.155146] ret_from_fork+0x2a/0x40
[ 39.159708]
[ 39.159708] -> #0 ((&ioend->io_work)){+.+.}:
[ 39.166126] process_one_work+0x244/0x6b0
[ 39.171184] worker_thread+0x48/0x3f0
[ 39.175845] kthread+0x147/0x180
[ 39.180020] ret_from_fork+0x2a/0x40
[ 39.184593] 0xffffffffffffffff
[ 39.188677]
[ 39.188677] other info that might help us debug this:
[ 39.188677]
[ 39.197611] Chain exists of:
[ 39.197611] (&ioend->io_work) --> &xfs_nondir_ilock_class --> (complete)&bp->b_iowait
[ 39.197611]
[ 39.211399] Possible unsafe locking scenario by crosslock:
[ 39.211399]
[ 39.219268] CPU0 CPU1
[ 39.224321] ---- ----
[ 39.229377] lock(&xfs_nondir_ilock_class);
[ 39.234135] lock((complete)&bp->b_iowait);
[ 39.238902] lock((&ioend->io_work));
[ 39.245889] unlock((complete)&bp->b_iowait);
[ 39.253660]
[ 39.253660] *** DEADLOCK ***
[ 39.253660]
[ 39.260267] 3 locks held by kworker/0:0/3:
[ 39.264838] #0: ("xfs-buf/%s"mp->m_fsname){++++}, at: [<ffffffff8110ed5f>] process_one_work+0x1ef/0x6b0
[ 39.275523] #1: ((&bp->b_ioend_work)){+.+.}, at: [<ffffffff8110ed5f>] process_one_work+0x1ef/0x6b0
[ 39.285721] #2: (&x->wait#17){....}, at: [<ffffffff8113daed>] complete+0x1d/0x60
[ 39.294176]
[ 39.294176] stack backtrace:
[ 39.299039] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.13.0-rc6-00609-g645737361ab6-dirty #3
[ 39.308749] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
[ 39.320209] Workqueue: xfs-buf/sdb1 xfs_buf_ioend_work
[ 39.325945] Call Trace:
[ 39.328677] dump_stack+0x86/0xcf
[ 39.332377] print_circular_bug+0x203/0x2f0
[ 39.337048] check_prev_add+0x3be/0x700
[ 39.341331] ? print_bfs_bug+0x40/0x40
[ 39.345518] lock_commit_crosslock+0x40d/0x5a0
[ 39.350479] ? lock_commit_crosslock+0x40d/0x5a0
[ 39.355633] ? xfs_buf_ioend_work+0x15/0x20
[ 39.360304] complete+0x29/0x60
[ 39.363810] xfs_buf_ioend+0x15e/0x470
[ 39.367987] xfs_buf_ioend_work+0x15/0x20
[ 39.372463] process_one_work+0x273/0x6b0
[ 39.376931] worker_thread+0x48/0x3f0
[ 39.381021] kthread+0x147/0x180
[ 39.384624] ? process_one_work+0x6b0/0x6b0
[ 39.389294] ? kthread_create_on_node+0x40/0x40
[ 39.394351] ret_from_fork+0x2a/0x40