Re: large IO over loop mount can cause one CPU to block in wait state(was: KVM repeatable hang...)

From: Antoine Martin
Date: Sat May 22 2010 - 15:42:27 EST


FW: Jim Paris said on KVM ML:
This sounds like:
https://bugzilla.kernel.org/show_bug.cgi?id=15906
https://bugzilla.redhat.com/show_bug.cgi?id=588930
-jim

And indeed it does.
Surprising to see that these bug entries are so recent because I have been experiencing this particular bug for quite some time:
http://www.mail-archive.com/kvm@xxxxxxxxxxxxxxx/msg27586.html
(Post from Jan 2010)
Probably since around 2.6.31, which matches the info the bug reports above.
Is no-one else using loop mounts any more?
Please queue for -stable

Cheers
Antoine




On 05/22/2010 04:54 PM, Antoine Martin wrote:
This may be related to
"Re: PROBLEM: oom killer and swap weirdness on 2.6.3* kernels"
http://lkml.org/lkml/2010/5/21/387

And also:
" Re: Gentoo guest with smp: emerge freeze while recompile world"
http://www.mail-archive.com/kvm@xxxxxxxxxxxxxxx/msg34629.html

As both the symptom and the way to trigger the problem are similar.
I originally posted this to KVM ML as the problem was only occurring on virtual machines at the time (IMO because they have less RAM).

Cheers
Antoine


-------- Original Message --------
Subject: Re: repeatable hang with loop mount and heavy IO in guest (now in host - not KVM then..)
Date: Fri, 21 May 2010 16:38:41 +0700
From: Antoine Martin <antoine@xxxxxxxxxxxxx>
To: Michael Tokarev <mjt@xxxxxxxxxx>
CC: kvm@xxxxxxxxxxxxxxx



On 02/27/2010 12:38 AM, Antoine Martin wrote:
1 0 0 98 0 1| 0 0 | 66B 354B| 0 0 | 30 11
1 1 0 98 0 0| 0 0 | 66B 354B| 0 0 | 29 11
From that point onwards, nothing will happen.
The host has disk IO to spare... So what is it waiting for??
Moved to an AMD64 host. No effect.
Disabled swap before running the test. No effect.
Moved the guest to a fully up-to-date FC12 server
(2.6.31.6-145.fc12.x86_64), no effect.
[snip]

Well, well. I've just hit the exact same bug on another *host* (not a
guest this time), running stock Fedora 12.
So this isn't a kvm bug after all. Definitely a loop+ext(4?) bug.
Looks like you need a pretty big loop mounted partition to trigger it.
(bigger than available ram?)

This is what triggered it on a quad amd system with 8Gb of ram, software
raid-1 partition:
mount -o loop 2GB.dd source
dd if=/dev/zero of=8GB.dd bs=1048576 count=8192
mkfs.ext4 -f 8GB.dd
mount -o loop 8GB.dd dest
rsync -rplogtD source/* dest/
umount source
umount dest
^ this is where it hangs (for over 10 minutes), I then tried to issue a 'sync' from another
terminal, which also hung.
It took more than 10 minutes to settle itself, during that time one CPU
was stuck in wait state. Apart from that the box is mostly idle.
dstat reported almost no IO at the time (<1MB/s)
I assume dstat reports page write back like any other disk IO?
That raid partition does ~60MB/s, so writing back 8GB shouldn't take much more than a couple of minutes. (that's even assuming it would have to write back the whole 8GB at umount time - which should not be the case)

Cheers
Antoine

Here's the hung trace:
INFO: task umount:526 blocked for more than 120 seconds.
"echo 0> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
umount D 0000000000000002 0 526 32488 0x00000000
ffff880140f9fc88 0000000000000086 ffff880008e3c228 ffffffff810d5fd9
ffff880140f9fc28 ffff880140f9fcd8 ffff880140f9ffd8 ffff880140f9ffd8
ffff88021b5e03d8 000000000000f980 0000000000015740 ffff88021b5e03d8
Call Trace:
[<ffffffff810d5fd9>] ? sync_page+0x0/0x4a
[<ffffffff81046fbd>] ? __enqueue_entity+0x7b/0x7d
[<ffffffff8113a047>] ? bdi_sched_wait+0x0/0x12
[<ffffffff8113a055>] bdi_sched_wait+0xe/0x12
[<ffffffff814549f0>] __wait_on_bit+0x48/0x7b
[<ffffffff8102649f>] ? native_smp_send_reschedule+0x5c/0x5e
[<ffffffff81454a91>] out_of_line_wait_on_bit+0x6e/0x79
[<ffffffff8113a047>] ? bdi_sched_wait+0x0/0x12
[<ffffffff810748dc>] ? wake_bit_function+0x0/0x33
[<ffffffff8113ad0b>] wait_on_bit.clone.1+0x1e/0x20
[<ffffffff8113ad71>] bdi_sync_writeback+0x64/0x6b
[<ffffffff8113ad9a>] sync_inodes_sb+0x22/0xec
[<ffffffff8113e547>] __sync_filesystem+0x4e/0x77
[<ffffffff8113e71d>] sync_filesystem+0x4b/0x4f
[<ffffffff8111d6d9>] generic_shutdown_super+0x27/0xc9
[<ffffffff8111d7a2>] kill_block_super+0x27/0x3f
[<ffffffff8111ded7>] deactivate_super+0x56/0x6b
[<ffffffff81134262>] mntput_no_expire+0xb4/0xec
[<ffffffff8113482a>] sys_umount+0x2d5/0x304
[<ffffffff81458133>] ? do_page_fault+0x270/0x2a0
[<ffffffff81011d32>] system_call_fastpath+0x16/0x1b
INFO: task umount:526 blocked for more than 120 seconds.
"echo 0> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
umount D 0000000000000002 0 526 32488 0x00000000
ffff880140f9fc88 0000000000000086 ffff880008e3c228 ffffffff810d5fd9
ffff880140f9fc28 ffff880140f9fcd8 ffff880140f9ffd8 ffff880140f9ffd8
ffff88021b5e03d8 000000000000f980 0000000000015740 ffff88021b5e03d8
Call Trace:
[<ffffffff810d5fd9>] ? sync_page+0x0/0x4a
[<ffffffff81046fbd>] ? __enqueue_entity+0x7b/0x7d
[<ffffffff8113a047>] ? bdi_sched_wait+0x0/0x12
[<ffffffff8113a055>] bdi_sched_wait+0xe/0x12
[<ffffffff814549f0>] __wait_on_bit+0x48/0x7b
[<ffffffff8102649f>] ? native_smp_send_reschedule+0x5c/0x5e
[<ffffffff81454a91>] out_of_line_wait_on_bit+0x6e/0x79
[<ffffffff8113a047>] ? bdi_sched_wait+0x0/0x12
[<ffffffff810748dc>] ? wake_bit_function+0x0/0x33
[<ffffffff8113ad0b>] wait_on_bit.clone.1+0x1e/0x20
[<ffffffff8113ad71>] bdi_sync_writeback+0x64/0x6b
[<ffffffff8113ad9a>] sync_inodes_sb+0x22/0xec
[<ffffffff8113e547>] __sync_filesystem+0x4e/0x77
[<ffffffff8113e71d>] sync_filesystem+0x4b/0x4f
[<ffffffff8111d6d9>] generic_shutdown_super+0x27/0xc9
[<ffffffff8111d7a2>] kill_block_super+0x27/0x3f
[<ffffffff8111ded7>] deactivate_super+0x56/0x6b
[<ffffffff81134262>] mntput_no_expire+0xb4/0xec
[<ffffffff8113482a>] sys_umount+0x2d5/0x304
[<ffffffff81458133>] ? do_page_fault+0x270/0x2a0
[<ffffffff81011d32>] system_call_fastpath+0x16/0x1b
INFO: task umount:526 blocked for more than 120 seconds.
"echo 0> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
umount D 0000000000000002 0 526 32488 0x00000000
ffff880140f9fc88 0000000000000086 ffff880008e3c228 ffffffff810d5fd9
ffff880140f9fc28 ffff880140f9fcd8 ffff880140f9ffd8 ffff880140f9ffd8
ffff88021b5e03d8 000000000000f980 0000000000015740 ffff88021b5e03d8
Call Trace:
[<ffffffff810d5fd9>] ? sync_page+0x0/0x4a
[<ffffffff81046fbd>] ? __enqueue_entity+0x7b/0x7d
[<ffffffff8113a047>] ? bdi_sched_wait+0x0/0x12
[<ffffffff8113a055>] bdi_sched_wait+0xe/0x12
[<ffffffff814549f0>] __wait_on_bit+0x48/0x7b
[<ffffffff8102649f>] ? native_smp_send_reschedule+0x5c/0x5e
[<ffffffff81454a91>] out_of_line_wait_on_bit+0x6e/0x79
[<ffffffff8113a047>] ? bdi_sched_wait+0x0/0x12
[<ffffffff810748dc>] ? wake_bit_function+0x0/0x33
[<ffffffff8113ad0b>] wait_on_bit.clone.1+0x1e/0x20
[<ffffffff8113ad71>] bdi_sync_writeback+0x64/0x6b
[<ffffffff8113ad9a>] sync_inodes_sb+0x22/0xec
[<ffffffff8113e547>] __sync_filesystem+0x4e/0x77
[<ffffffff8113e71d>] sync_filesystem+0x4b/0x4f
[<ffffffff8111d6d9>] generic_shutdown_super+0x27/0xc9
[<ffffffff8111d7a2>] kill_block_super+0x27/0x3f
[<ffffffff8111ded7>] deactivate_super+0x56/0x6b
[<ffffffff81134262>] mntput_no_expire+0xb4/0xec
[<ffffffff8113482a>] sys_umount+0x2d5/0x304
[<ffffffff81458133>] ? do_page_fault+0x270/0x2a0
[<ffffffff81011d32>] system_call_fastpath+0x16/0x1b
INFO: task umount:526 blocked for more than 120 seconds.
"echo 0> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
umount D 0000000000000002 0 526 32488 0x00000000
ffff880140f9fc88 0000000000000086 ffff880008e3c228 ffffffff810d5fd9
ffff880140f9fc28 ffff880140f9fcd8 ffff880140f9ffd8 ffff880140f9ffd8
ffff88021b5e03d8 000000000000f980 0000000000015740 ffff88021b5e03d8
Call Trace:
[<ffffffff810d5fd9>] ? sync_page+0x0/0x4a
[<ffffffff81046fbd>] ? __enqueue_entity+0x7b/0x7d
[<ffffffff8113a047>] ? bdi_sched_wait+0x0/0x12
[<ffffffff8113a055>] bdi_sched_wait+0xe/0x12
[<ffffffff814549f0>] __wait_on_bit+0x48/0x7b
[<ffffffff8102649f>] ? native_smp_send_reschedule+0x5c/0x5e
[<ffffffff81454a91>] out_of_line_wait_on_bit+0x6e/0x79
[<ffffffff8113a047>] ? bdi_sched_wait+0x0/0x12
[<ffffffff810748dc>] ? wake_bit_function+0x0/0x33
[<ffffffff8113ad0b>] wait_on_bit.clone.1+0x1e/0x20
[<ffffffff8113ad71>] bdi_sync_writeback+0x64/0x6b
[<ffffffff8113ad9a>] sync_inodes_sb+0x22/0xec
[<ffffffff8113e547>] __sync_filesystem+0x4e/0x77
[<ffffffff8113e71d>] sync_filesystem+0x4b/0x4f
[<ffffffff8111d6d9>] generic_shutdown_super+0x27/0xc9
[<ffffffff8111d7a2>] kill_block_super+0x27/0x3f
[<ffffffff8111ded7>] deactivate_super+0x56/0x6b
[<ffffffff81134262>] mntput_no_expire+0xb4/0xec
[<ffffffff8113482a>] sys_umount+0x2d5/0x304
[<ffffffff81458133>] ? do_page_fault+0x270/0x2a0
[<ffffffff81011d32>] system_call_fastpath+0x16/0x1b
INFO: task sync:741 blocked for more than 120 seconds.
"echo 0> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
sync D 0000000000000000 0 741 552 0x00000000
ffff88003c31fed8 0000000000000086 0000000000000000 ffff8801b7472ec0
ffff88003c31fe38 0000000000000246 ffff88003c31ffd8 ffff88003c31ffd8
ffff8801b7473298 000000000000f980 0000000000015740 ffff8801b7473298
Call Trace:
[<ffffffff81455a2e>] __down_read+0x92/0xaa
[<ffffffff814550e1>] down_read+0x31/0x35
[<ffffffff8113e5f6>] sync_filesystems+0x86/0xf6
[<ffffffff8113e6b6>] sys_sync+0x17/0x33
[<ffffffff81011d32>] system_call_fastpath+0x16/0x1b



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/