BUG: spinlock lockup, async_umap_flush_lock in 3.4, 3.7, 3.8

From: Hank Leininger
Date: Sun May 12 2013 - 23:54:52 EST


I've got several systems with similar hardware which crash with BUG:
spinlock errors on async_umap_flush_lock such as:

BUG: spinlock lockup suspected on CPU#0, sh/1166
lock: async_umap_flush_lock+0x0/0x20, .magic: dead4ead, .owner: swapper/23/0, .owner_cpu: 23
BUG: spinlock lockup suspected on CPU#19, scsi_eh_0/1408
lock: async_umap_flush_lock+0x0/0x20, .magic: dead4ead, .owner: swapper/23/0, .owner_cpu: 23

(More examples below.)

In general these happen very rarely--but a specific userland workload
(lots of mongodb + sqlite reads & writes, while other CPUs are running
compute-heavy tasks) seems to trigger it within a few minutes to hours.
After 1-3 "spinlock lockup suspected" errors, the system locks up, no
response to alt+sysrq.

I've gotten the crash on one system in the last couple of days with
3.7.1-gentoo, 3.8.11-gentoo, 3.8.11 vanilla, and 3.4.4 vanilla. When
I looked further back, over the past year another system crashed with
similar errors (under similar workload) running 3.7.0-gentoo and
3.8.4-gentoo. Further back than that there are 2-3 crashes on those
and other similar systems using 2.6.x and 3.0.x, but their errors are
different enough that they may not be related.

These systems each have:

Supermicro X8DTU-F motherboard
2x Xeon E5645 (6 cores each + hyperthreading)
24 GB ECC RAM
Adaptec 51645 RAID controller w/bbu
12x 2TB SAS disks

They are using hw raid, 11 disks in a RAID6 with 1 hot-spare; main
partition is 16 TB.

They all use loop-aes v3.6g as a replacement loop.ko module to encrypt
their / filesystem (using the aes-ni instruction set).

3.8.11 .config pastebin: http://pastebin.com/u3BDPTvP

3.4.44 .config pastebin: http://pastebin.com/1Rpk9RVf

Generally speaking, 3.8.x and 3.4.44 kernels were compiled with GCC 4.7;
the older 3.7.x kernels were compiled with GCC 4.6.

Error messages, captured by serial consoles, newest crashes first:

Host1:

3.4.44
BUG: spinlock lockup on CPU#0, john/21637
lock: ffffffff816558d0, .magic: dead4ead, .owner: mongod/27646, .owner_cpu: 8
BUG: spinlock lockup on CPU#6, mongod/3256
lock: ffff880621867860, .magic: dead4ead, .owner: mongod/3251, .owner_cpu: 18
BUG: spinlock lockup on CPU#20, khugepaged/735
lock: ffff880621867860, .magic: dead4ead, .owner: mongod/3251, .owner_cpu: 18

3.8.11
BUG: spinlock lockup suspected on CPU#0, sh/1166
lock: async_umap_flush_lock+0x0/0x20, .magic: dead4ead, .owner: swapper/23/0, .owner_cpu: 23
BUG: spinlock lockup suspected on CPU#19, scsi_eh_0/1408
lock: async_umap_flush_lock+0x0/0x20, .magic: dead4ead, .owner: swapper/23/0, .owner_cpu: 23

3.8.11-gentoo
BUG: spinlock lockup suspected on CPU#0, swapper/0/0
lock: async_umap_flush_lock+0x0/0x20, .magic: dead4ead, .owner: mongod/3678, .owner_cpu: 4
BUG: spinlock lockup suspected on CPU#16, mongod/3115
lock: 0xffff880620ab47a8, .magic: dead4ead, .owner: flush-7:4/1915, .owner_cpu: 5
BUG: spinlock lockup suspected on CPU#6, khugepaged/744
lock: 0xffff880620ab47a8, .magic: dead4ead, .owner: flush-7:4/1915, .owner_cpu: 5

3.7.1-gentoo
BUG: spinlock lockup suspected on CPU#0, john/32030
lock: async_umap_flush_lock+0x0/0x20, .magic: dead4ead, .owner: swapper/13/0, .owner_cpu: 13
BUG: spinlock lockup suspected on CPU#19, mongod/18985
lock: 0xffff8806221f7860, .magic: dead4ead, .owner: mongod/18975, .owner_cpu: 2
BUG: spinlock lockup suspected on CPU#3, scsi_eh_0/1407
lock: async_umap_flush_lock+0x0/0x20, .magic: dead4ead, .owner: swapper/13/0, .owner_cpu: 13
BUG: spinlock lockup suspected on CPU#9, khugepaged/741
lock: 0xffff8806221f7860, .magic: dead4ead, .owner: mongod/18975, .owner_cpu: 2

Host2:

3.8.4-gentoo
BUG: spinlock lockup suspected on CPU#0, swapper/0/0
lock: async_umap_flush_lock+0x0/0x20, .magic: dead4ead, .owner: mongod/22377, .owner_cpu: 9
BUG: spinlock lockup suspected on CPU#4, mongod/3377
lock: 0xffff880621d00f68, .magic: dead4ead, .owner: kswapd0/689, .owner_cpu: 14
BUG: spinlock lockup suspected on CPU#21, mongod/3375
lock: 0xffff880621d00f68, .magic: dead4ead, .owner: kswapd0/689, .owner_cpu: 14

3.7.0-gentoo
BUG: spinlock lockup suspected on CPU#0, swapper/0/0
lock: async_umap_flush_lock+0x0/0x20, .magic: dead4ead, .owner: mongo/16561, .owner_cpu: 3

(The repeated crashes on Host2 lead to irreperable ext4 corruption.)

I can provide System.map files if they are interesting. I'd be happy
to try a specific kernel, add patches to harvest more information in
the event of a crash, etc.

Thanks,

--

Hank Leininger <hlein@xxxxxxxxx>
3C2A 4EEE ED36 D136 18F2 1B30 47A8 D14B E13E 9C6A

Attachment: signature.asc
Description: Digital signature