Re: chain_key collision with 4.7.0-rec2+

From: David Ahern
Date: Tue Jun 21 2016 - 09:22:48 EST


On 6/21/16 7:13 AM, Peter Zijlstra wrote:
Plz as to not wrap dmesg output when pasting

not wrapped on my end -- the copy in Send folder in or the ouptut in your response.


IPv4 one:

[ 189.171084] ======================
[ 189.171555] [chain_key collision ]
[ 189.172020] 4.7.0-rc2+ #310 Tainted: G W
[ 189.172729] ----------------------
[ 189.173192] sshd/1463: Hash chain already cached but the contents don't > match!
[ 189.174236] Held locks:depth: 6
[ 189.174707] class_idx:1253 -> chain_key:00000000000004e5 > (sk_lock-AF_INET){+.+.+.}, at: [<ffffffff8144a84c>] lock_sock+0xb/0xd
[ 189.176376] class_idx:26 -> chain_key:00000000009ca01a > (rcu_read_lock){......}, at: [<ffffffff814421fb>] rcu_lock_acquire+0x0/0x20
[ 189.178014] class_idx:898 -> chain_key:0000001394034382 > (rcu_read_lock_bh){......}, at: [<ffffffff814421fb>] > rcu_lock_acquire+0x0/0x20
[ 189.179712] class_idx:898 -> chain_key:0002728068704382 > (rcu_read_lock_bh){......}, at: [<ffffffff81400393>] > rcu_lock_acquire+0x0/0x20
[ 189.181374] class_idx:1221 -> chain_key:4e500d0e087044c5 > (dev->qdisc_running_key ?: &qdisc_running_key){+.....}, at: > [<ffffffff81408a7e>] __dev_queue_xmit+0x40c/0x72d
[ 189.183400] class_idx:1222 -> chain_key:01a1c10e0898ad0c > (_xmit_ETHER#2){+.-...}, at: [<ffffffff814287e6>] sch_direct_xmit+0x6f/0x167
[ 189.185178] Locks in cached chain:depth: 6
[ 189.185791] class_idx:1252 -> chain_key:00000000000004e4 > (((&ndev->rs_timer))){+.-...}
[ 189.186957] class_idx:26 -> chain_key:00000000009c801a > (rcu_read_lock){......}
[ 189.187991] class_idx:898 -> chain_key:0000001390034382 > (rcu_read_lock_bh){......}
[ 189.189023] class_idx:898 -> chain_key:0002720068704382 > (rcu_read_lock_bh){......}
[ 189.190050] class_idx:1221 -> chain_key:4e400d0e087044c5 > (dev->qdisc_running_key ?: &qdisc_running_key){+.....}
[ 189.191370] class_idx:1220 -> chain_key:01a1c10e0898ad0c > (&(&list->lock)->rlock#3){+.-...}

and IPv6:

[ 22.082615] ======================
[ 22.083126] [chain_key collision ]
[ 22.083646] 4.7.0-rc2+ #310 Tainted: G W
[ 22.084410] ----------------------
[ 22.084927] sshd/1401: Hash chain already cached but the contents don't > match!
[ 22.086010] Held locks:depth: 6
[ 22.086509] class_idx:1255 -> chain_key:00000000000004e7 > (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff8144a84c>] lock_sock+0xb/0xd
[ 22.088253] class_idx:26 -> chain_key:00000000009ce01a > (rcu_read_lock){......}, at: [<ffffffff814cd8a2>] rcu_read_lock+0x0/0x68
[ 22.089999] class_idx:898 -> chain_key:000000139c034382 > (rcu_read_lock_bh){......}, at: [<ffffffff814a031b>] > rcu_lock_acquire+0x0/0x20
[ 22.091811] class_idx:898 -> chain_key:0002738068704382 > (rcu_read_lock_bh){......}, at: [<ffffffff81400393>] > rcu_lock_acquire+0x0/0x20
[ 22.093639] class_idx:1223 -> chain_key:4e700d0e087044c7 > (dev->qdisc_running_key ?: &qdisc_running_key){+.....}, at: > [<ffffffff81408a7e>] __dev_queue_xmit+0x40c/0x72d
[ 22.095824] class_idx:1224 -> chain_key:01a1c10e0898ed06 > (_xmit_ETHER#2){+.-...}, at: [<ffffffff814287e6>] sch_direct_xmit+0x6f/0x167
[ 22.097616] Locks in cached chain:depth: 6
[ 22.098197] class_idx:1248 -> chain_key:00000000000004e0 > (((&ndev->rs_timer))){+.-...}
[ 22.099345] class_idx:26 -> chain_key:00000000009c001a > (rcu_read_lock){......}
[ 22.100419] class_idx:898 -> chain_key:0000001380034382 > (rcu_read_lock_bh){......}
[ 22.101535] class_idx:898 -> chain_key:0002700068704382 > (rcu_read_lock_bh){......}
[ 22.102656] class_idx:1223 -> chain_key:4e000d0e087044c7 > (dev->qdisc_running_key ?: &qdisc_running_key){+.....}
[ 22.104095] class_idx:1222 -> chain_key:01a1c10e0898ed06 > (&(&list->lock)->rlock#3){+.-...}


Right, so those hashes do indeed match up pretty nicely.

Now note that this isn't a new issue, just a new warning. Such
collisions were silent before and could/can/do hide actual lock cycles.

In any case, could you perhaps try with commit (from tip/locking/core)
to see if that makes it any better:

dfaaf3fa01d6 ("locking/lockdep: Use __jhash_mix() for iterate_chain_key()")

I'll add it and run for a while. I tend to see the trace once a day or so depending on my workload (ie., time to work on code).


I still have to consider a better solution to the problem, because
fundamentally collisions are unavoidable, and we should deal better with
them. But our mixing function did suck somewhat, so making collisions
less likely might give us a little more time to ponder.

But if you're really annoyed by it all, disable DEBUG_LOCKDEP for a
while and rest assured that its not more broken than it was before --
just louder :-)


Not really an option if I want to make sure I am not adding locking issues and the habit of ignoring splats defeats the purpose of enabling the lock debugging. So hopefully you come up with a better answer.

Thanks for the quick response