KVM: Softlockups in guests while running kernbench

From: Nikunj A Dadhania
Date: Thu May 10 2012 - 04:16:32 EST



I am running a 3.4.0-rc4 based kernel(commit: af3a3ab2), guest config
attached.

During my tests, I saw few softlockups inside the guests while running
kernbench inside the guest. I can reproduce this repeatedly.

Test Setup:

- Create 4 VMs (8vcpu, 1GB RAM)
- Run kernbench inside each guests (kernbench -f -M -H -o 16) in parallel


BUG: soft lockup - CPU#4 stuck for 23s! [cc1:5578]
Modules linked in:
CPU 4
Modules linked in:

Pid: 5578, comm: cc1 Not tainted 3.4.0-rc4-base+ #213 Red Hat KVM
RIP: 0010:[<ffffffff815e34d9>] [<ffffffff815e34d9>] _raw_spin_unlock_irqrestore+0x19/0x30
RSP: 0018:ffff8800385b1cc0 EFLAGS: 00000286
RAX: ffffea00006d1b00 RBX: ffff8800385b1d08 RCX: ffff88003d981680
RDX: ffffea000089e4e8 RSI: 0000000000000286 RDI: 0000000000000286
RBP: ffff8800385b1cc8 R08: ffffea000089e4e8 R09: 00000000ffffffff
R10: ffffea0000643f50 R11: ffffea0000459000 R12: 0000000000000297
R13: ffff8800385b1c88 R14: ffff88003d981680 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff88003d700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000003cddcaae90 CR3: 0000000001a0b000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process cc1 (pid: 5578, threadinfo ffff8800385b0000, task ffff88003a20c0c0)
Stack:
000000000000000e ffff8800385b1d48 ffffffff8110d7ed ffff8800385b1d78
ffffea00006d1b00 ffff88003d981680 0000000000000005 0000000000000286
0000000200000000 ffffea00006d1b00 ffffea00007949f0 0000000000000000
Call Trace:
[<ffffffff8110d7ed>] release_pages+0x1ad/0x210
[<ffffffff81139424>] free_pages_and_swap_cache+0xb4/0xe0
[<ffffffff812d1da1>] ? cpumask_any_but+0x31/0x50
[<ffffffff8112515c>] tlb_flush_mmu+0x6c/0x90
[<ffffffff81125644>] tlb_finish_mmu+0x14/0x40
[<ffffffff8112c4a8>] exit_mmap+0xd8/0x140
[<ffffffff8103e143>] mmput+0x73/0x120
[<ffffffff81044fca>] exit_mm+0x10a/0x140
[<ffffffff8104516b>] do_exit+0x16b/0x450
[<ffffffff810454a5>] do_group_exit+0x55/0xd0
[<ffffffff81045537>] sys_exit_group+0x17/0x20
[<ffffffff815eb9e9>] system_call_fastpath+0x16/0x1b
Code: 66 90 e8 db b9 a4 ff 66 90 c9 c3 0f 1f 80 00 00 00 00 55 48 89 e5 53 66 66 66 66 90 48 89 f3 e8 1e ba a4 ff 66 90 48 89 df 57 9d <66> 66 90 66 90 5b c9 c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00
Call Trace:
[<ffffffff8110d7ed>] release_pages+0x1ad/0x210
[<ffffffff81139424>] free_pages_and_swap_cache+0xb4/0xe0
[<ffffffff812d1da1>] ? cpumask_any_but+0x31/0x50
[<ffffffff8112515c>] tlb_flush_mmu+0x6c/0x90
[<ffffffff81125644>] tlb_finish_mmu+0x14/0x40
[<ffffffff8112c4a8>] exit_mmap+0xd8/0x140
[<ffffffff8103e143>] mmput+0x73/0x120
[<ffffffff81044fca>] exit_mm+0x10a/0x140
[<ffffffff8104516b>] do_exit+0x16b/0x450
[<ffffffff810454a5>] do_group_exit+0x55/0xd0
[<ffffffff81045537>] sys_exit_group+0x17/0x20
[<ffffffff815eb9e9>] system_call_fastpath+0x16/0x1b
BUG: soft lockup - CPU#3 stuck for 22s! [khugepaged:52]
Modules linked in:
CPU 3
Modules linked in:

Pid: 52, comm: khugepaged Not tainted 3.4.0-rc4-base+ #213 Red Hat KVM
RIP: 0010:[<ffffffff815e34d9>] [<ffffffff815e34d9>] _raw_spin_unlock_irqrestore+0x19/0x30
RSP: 0018:ffff880038183b18 EFLAGS: 00000282
RAX: 000000000001529a RBX: 0000000000000010 RCX: ffffffffffffffe0
RDX: ffffffffffffffe1 RSI: 0000000000000282 RDI: 0000000000000282
RBP: ffff880038183b20 R08: 00000000ffffffe2 R09: 0000000000000000
R10: ffff88003e7fdf88 R11: 0000000000000000 R12: ffffea00009764f0
R13: 0000000200000282 R14: 0000000000000286 R15: 0000000000000009
FS: 0000000000000000(0000) GS:ffff88003d6c0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000481010 CR3: 0000000001a0b000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process khugepaged (pid: 52, threadinfo ffff880038182000, task ffff88003817cb30)
Stack:
ffff88003d981680 ffff880038183b60 ffffffff8110d450 ffff880038183b70
ffffea0000976508 ffffea0000976508 ffff88003ddd8ca8 0000000000000ca8
0000000000457898 ffff880038183b80 ffffffff8110d4c6 ffff880038183b80
Call Trace:
[<ffffffff8110d450>] __page_cache_release+0x100/0x130
[<ffffffff8110d4c6>] __put_single_page+0x16/0x30
[<ffffffff8110dba5>] put_page+0x25/0x40
[<ffffffff8113933a>] free_page_and_swap_cache+0x2a/0x60
[<ffffffff81152381>] __collapse_huge_page_copy+0x1d1/0x200
[<ffffffff81152096>] ? __collapse_huge_page_isolate+0x116/0x1c0
[<ffffffff811526d7>] collapse_huge_page+0x327/0x500
[<ffffffff812d84be>] ? rb_erase+0xce/0x150
[<ffffffff81152a75>] khugepaged_scan_pmd+0x1c5/0x2f0
[<ffffffff81152ffe>] khugepaged_scan_mm_slot+0x17e/0x280
[<ffffffff8115407c>] khugepaged_do_scan+0xbc/0xe0
[<ffffffff81154f5f>] khugepaged_loop+0x5f/0x3c0
[<ffffffff81062bd0>] ? wake_up_bit+0x40/0x40
[<ffffffff811552c0>] ? khugepaged_loop+0x3c0/0x3c0
[<ffffffff8115530a>] khugepaged+0x4a/0xb0
[<ffffffff810624fe>] kthread+0x9e/0xb0
[<ffffffff815ecce4>] kernel_thread_helper+0x4/0x10
[<ffffffff81062460>] ? kthread_freezable_should_stop+0x70/0x70
[<ffffffff815ecce0>] ? gs_change+0x13/0x13
Code: 66 90 e8 db b9 a4 ff 66 90 c9 c3 0f 1f 80 00 00 00 00 55 48 89 e5 53 66 66 66 66 90 48 89 f3 e8 1e ba a4 ff 66 90 48 89 df 57 9d <66> 66 90 66 90 5b c9 c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00
Call Trace:
[<ffffffff8110d450>] __page_cache_release+0x100/0x130
[<ffffffff8110d4c6>] __put_single_page+0x16/0x30
[<ffffffff8110dba5>] put_page+0x25/0x40
[<ffffffff8113933a>] free_page_and_swap_cache+0x2a/0x60
[<ffffffff81152381>] __collapse_huge_page_copy+0x1d1/0x200
[<ffffffff81152096>] ? __collapse_huge_page_isolate+0x116/0x1c0
[<ffffffff811526d7>] collapse_huge_page+0x327/0x500
[<ffffffff812d84be>] ? rb_erase+0xce/0x150
[<ffffffff81152a75>] khugepaged_scan_pmd+0x1c5/0x2f0
[<ffffffff81152ffe>] khugepaged_scan_mm_slot+0x17e/0x280
[<ffffffff8115407c>] khugepaged_do_scan+0xbc/0xe0
[<ffffffff81154f5f>] khugepaged_loop+0x5f/0x3c0
[<ffffffff81062bd0>] ? wake_up_bit+0x40/0x40
[<ffffffff811552c0>] ? khugepaged_loop+0x3c0/0x3c0
[<ffffffff8115530a>] khugepaged+0x4a/0xb0
[<ffffffff810624fe>] kthread+0x9e/0xb0
[<ffffffff815ecce4>] kernel_thread_helper+0x4/0x10
[<ffffffff81062460>] ? kthread_freezable_should_stop+0x70/0x70
[<ffffffff815ecce0>] ? gs_change+0x13/0x13
hrtimer: interrupt took 78981977 ns
hdparm: sending ioctl 330 to a partition!
hdparm: sending ioctl 330 to a partition!
hdparm: sending ioctl 330 to a partition!
hdparm: sending ioctl 330 to a partition!
hdparm: sending ioctl 330 to a partition!
hdparm: sending ioctl 330 to a partition!
hdparm: sending ioctl 330 to a partition!
BUG: soft lockup - CPU#0 stuck for 22s! [khugepaged:52]
Modules linked in:
CPU 0
Modules linked in:

Pid: 52, comm: khugepaged Not tainted 3.4.0-rc4-base+ #213 Red Hat KVM
RIP: 0010:[<ffffffff8111286e>] [<ffffffff8111286e>] isolate_lru_page+0x12e/0x170
RSP: 0000:ffff880038183b90 EFLAGS: 00000286
RAX: 000000000001529a RBX: 0000000000000000 RCX: ffffffffffffffe0
RDX: ffffffffffffffe4 RSI: 0000000000000002 RDI: ffff88003d981b00
RBP: ffff880038183bc0 R08: 00000000ffffffe5 R09: 0000000000015258
R10: ffff88003d981680 R11: 0000000000000020 R12: ffffea0000483000
R13: ffffc90000182000 R14: 0000000000000003 R15: ffff88003b9f6000
FS: 0000000000000000(0000) GS:ffff88003d600000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f310dba6000 CR3: 0000000001a0b000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process khugepaged (pid: 52, threadinfo ffff880038182000, task ffff88003817cb30)
Stack:
ffffffff81a55ce0 ffffea0000483000 ffff88003ddd2e40 0000000000000e40
00007fb857dc8000 0000000000000067 ffff880038183c30 ffffffff8115207d
ffff880038183bf0 ffffea0000833008 ffff88003ddd3000 000000013ddd2067
Call Trace:
[<ffffffff8115207d>] __collapse_huge_page_isolate+0xfd/0x1c0
[<ffffffff81152696>] collapse_huge_page+0x2e6/0x500
[<ffffffff81152a75>] khugepaged_scan_pmd+0x1c5/0x2f0
[<ffffffff81152ffe>] khugepaged_scan_mm_slot+0x17e/0x280
[<ffffffff8115407c>] khugepaged_do_scan+0xbc/0xe0
[<ffffffff81154f5f>] khugepaged_loop+0x5f/0x3c0
[<ffffffff81062bd0>] ? wake_up_bit+0x40/0x40
[<ffffffff811552c0>] ? khugepaged_loop+0x3c0/0x3c0
[<ffffffff8115530a>] khugepaged+0x4a/0xb0
[<ffffffff810624fe>] kthread+0x9e/0xb0
[<ffffffff815ecce4>] kernel_thread_helper+0x4/0x10
[<ffffffff81062460>] ? kthread_freezable_should_stop+0x70/0x70
[<ffffffff815ecce0>] ? gs_change+0x13/0x13
Code: 89 de 4c 89 ef 25 00 40 00 00 48 83 f8 01 19 d2 31 db 81 ca 00 fe ff ff e8 30 c6 00 00 4c 89 e7 e8 88 c6 f1 ff 66 90 fb 66 66 90 <66> 66 90 e9 08 ff ff ff 66 2e 0f 1f 84 00 00 00 00 00 89 c3 eb
Call Trace:
[<ffffffff8115207d>] __collapse_huge_page_isolate+0xfd/0x1c0
[<ffffffff81152696>] collapse_huge_page+0x2e6/0x500
[<ffffffff81152a75>] khugepaged_scan_pmd+0x1c5/0x2f0
[<ffffffff81152ffe>] khugepaged_scan_mm_slot+0x17e/0x280
[<ffffffff8115407c>] khugepaged_do_scan+0xbc/0xe0
[<ffffffff81154f5f>] khugepaged_loop+0x5f/0x3c0
[<ffffffff81062bd0>] ? wake_up_bit+0x40/0x40
[<ffffffff811552c0>] ? khugepaged_loop+0x3c0/0x3c0
[<ffffffff8115530a>] khugepaged+0x4a/0xb0
[<ffffffff810624fe>] kthread+0x9e/0xb0
[<ffffffff815ecce4>] kernel_thread_helper+0x4/0x10
[<ffffffff81062460>] ? kthread_freezable_should_stop+0x70/0x70
[<ffffffff815ecce0>] ? gs_change+0x13/0x13

Attachment: guestconfig
Description: Binary data