IOWAIT problem

From: govind raj
Date: Tue Mar 21 2006 - 10:51:26 EST


Hi all,

We are facing a problem in the Linux kernel going into a near-100% IO-WAIT condition and the machine freezes.

We have already googled and gone through the various IOWAIT issues that have been reported specifically on Linux 2.6.x kernel but
could not find any resolutions to those email threads. We would appreciate if the experts can shed some light on this behaviour as we
are currently struggling with this situation for the last 4 weeks now.

Here is our environment:

Hardware: Intel 915 / 4 SATA ports on the motherboard / 1GB RAM
Hard disks: Seagate Barracuda and Western Digital
Hard disks capacity: 250 GB (Seagate) and 250 GB (Western Digital)

Kernel: Linux kernel 2.6.12.6
Other software: EVMS 2.5.4 (with 2.6 patch of the device mapper installed on the server)
Device driver: libata / ata_piix and MD (Multiple disk)

Here is what we are observing:

a) When we create a RAID set using just a single disk, everything works fine. We are able to mount the disk (or) create EVMS volumes
(or) read/write heavy quantities of data to this RAID set. This is absolutely No problem at all

b) When we create a RAID set (RAID0 or RAID1 or RAID5) spanning MORE than one disk, then we observe that very frequently, the 'top'
shows that the CPU IO-WAIT is reaching 100% and the whole machine just seems to freeze eternally.... untill we press the RESET button..

Some of the various activities that seem to trigger this are: Creating volumes or snapshots over this raid set spanning more than 1 disk,
mounting / unmounting the volumes on top of this RAID set, reading / writing the data. There is _NO_ particular sequence that is alone
caused this problem over the last 4 weeks.

Debug information:

When we do a top of the system, the following is the output:

===================================================
15:51:47 up 58 min, 7 users, load average: 0.99, 0.62, 0.25
47 processes: 45 sleeping, 2 running, 0 zombie, 0 stopped
CPU states: cpu user nice system irq softirq iowait idle
total 0.0% 0.0% 0.0% 0.0% 0.0% 100.0% 0.0%
Mem: 507000k av, 141180k used, 365820k free, 0k shrd, 28216k
buff
51736k active, 70212k inactive
Swap: 0k av, 0k used, 0k free 78204k
cached

PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU
COMMAND
12881 root 16 0 2152 992 780 R 0.1 0.1 0:00 0 top
1 root 16 0 724 300 256 S 0.0 0.0 0:00 0 init
2 root RT 0 0 0 0 SW 0.0 0.0 0:00 0
migration/0
3 root 34 19 0 0 0 SWN 0.0 0.0 0:00 0
ksoftirqd/0
4 root 10 -5 0 0 0 SW< 0.0 0.0 0:00 0
events/0
5 root 11 -5 0 0 0 SW< 0.0 0.0 0:00 0
khelper

===================================================

When we do a Ctrl + Scroll lock key sequence during the time the machine is in an IOWAIT state of 100%, the output is as follows:

===================================================

[<c011b9e8>] do_wait+0x27c/0x30f
[<c0114e72>] default_wake_function+0x0/0x12
[<c01181e8>] do_fork+0xff/0x165
[<c0114e72>] default_wake_function+0x0/0x12
[<c011bb13>] sys_wait4+0x28/0x2c
[<c0102d91>] syscall_call+0x7/0xb
umount D DCA52E80 0 6581 6580 (NOTLB)
d79c7e6c 00000086 e0192626 dca52e80 db7d7280 db128a40 e0190909 db128a40
00000000 c13f4e80 c13f4520 00000000 00005bd4 7200474b 00000072 db128a40
d79faa40 d79fab64 00000000 00000000 c13f4520 00000000 c13ef6c0
d79c7e78
Call Trace:
[<e0192626>] dm_table_unplug_all+0x27/0x31 [dm_mod]
[<e0190909>] dm_unplug_all+0x1d/0x2a [dm_mod]
[<c02d4ad5>] io_schedule+0x26/0x30
[<c014ed8c>] sync_buffer+0x0/0x3b
[<c014edc4>] sync_buffer+0x38/0x3b
[<c02d4c73>] __wait_on_bit+0x2b/0x51
[<c014ed8c>] sync_buffer+0x0/0x3b
[<c02d4d0a>] out_of_line_wait_on_bit+0x71/0x79
[<c01297d0>] wake_bit_function+0x0/0x34
[<c0151aaa>] submit_bh+0x128/0x14e
[<c01297d0>] wake_bit_function+0x0/0x34
[<c0151bb7>] sync_dirty_buffer+0x77/0xa6
[<c018400b>] ext2_sync_super+0x42/0x4a
[<c0183244>] ext2_put_super+0x26/0x9f
[<c015331e>] generic_shutdown_super+0x75/0xf9
[<c0153c20>] kill_block_super+0x1a/0x2e
[<c0153211>] deactivate_super+0x46/0x59
[<c01656b8>] sys_umount+0x6b/0x73
[<c014493b>] do_munmap+0xdf/0xeb
[<c0144979>] sys_munmap+0x32/0x4d
[<c01121de>] do_page_fault+0x0/0x4cc
[<c0102d37>] sysenter_past_esp+0x54/0x75

==========================================

Another situtation when the IO-WAIT went to 100%, the output of the Ctrl+Lock key sequence was as follows:

=========================================
evms D DA4D8F1C 0 12863 12861 12874 (NOTLB)
d6d4bd5c 00000086 c14c5480 da4d8f1c e0190909 c14c5480 da32ee40 e0192626
c14c5480 da4d8e80 c13f4520 00000000 0001aa83 759d11c3 000002f2 c0332c00
d6640530 d6640654 e0190909 00000000 c13f4520 00000000 c13f0340
d6d4bd68
Call Trace:
[<e0190909>] dm_unplug_all+0x1d/0x2a [dm_mod]
[<e0192626>] dm_table_unplug_all+0x27/0x31 [dm_mod]
[<e0190909>] dm_unplug_all+0x1d/0x2a [dm_mod]
[<c02d4ad5>] io_schedule+0x26/0x30
[<c01353ff>] sync_page+0x0/0x49
[<c0135445>] sync_page+0x46/0x49
[<c02d4c73>] __wait_on_bit+0x2b/0x51
[<c0135996>] wait_on_page_bit+0x71/0x7a
[<c01297d0>] wake_bit_function+0x0/0x34
[<c01297d0>] wake_bit_function+0x0/0x34
[<c01355c0>] wait_on_page_writeback_range+0xac/0xeb
[<c0135568>] wait_on_page_writeback_range+0x54/0xeb
[<c01357ad>] filemap_fdatawait+0x4b/0x52
[<c014ef30>] sync_blockdev+0x2a/0x36
[<c017539a>] quota_sync_sb+0x33/0xe9
[<c014f028>] freeze_bdev+0x45/0xc6
[<e019102d>] __lock_fs+0x71/0xa4 [dm_mod]
[<e01910f0>] dm_suspend+0x59/0x1bc [dm_mod]
[<c0114e72>] default_wake_function+0x0/0x12
[<c0114e72>] default_wake_function+0x0/0x12
[<e019319e>] __get_name_cell+0xe/0x60 [dm_mod]
[<e0193da8>] do_suspend+0xff/0x125 [dm_mod]
[<e0194dd0>] ctl_ioctl+0xdd/0x128 [dm_mod]
[<e0193f44>] dev_suspend+0x0/0x18 [dm_mod]
[<c015d28d>] do_ioctl+0x55/0x66
[<c015d539>] vfs_ioctl+0x184/0x192
[<c015d572>] sys_ioctl+0x2b/0x45
[<c0102d91>] syscall_call+0x7/0xb

evms S 00000000 0 12874 12863 12875 (NOTLB)
d4adbf1c 00000086 00000001 00000000 d6640a40 00000010 c0339000 00000000
000000d0 d4adbf98 c13f4520 00000000 00000234 f1f43abc 000002f8 c0332c00
d6640a40 d6640b64 002d1dc2 00000000 002d1dc2 d4adbf24 d4adbf98 d5c36e20 Call Trace:
[<c02d4bb2>] schedule_timeout+0x91/0xae
[<c01203b1>] process_timeout+0x0/0x9
[<c015e2d1>] do_poll+0x50/0xae
[<c015e311>] do_poll+0x90/0xae
[<c015e468>] sys_poll+0x139/0x1d1
[<c015da3f>] __pollwait+0x0/0x9b
[<c0102d91>] syscall_call+0x7/0xb
evms S D5F05F6C 0 12875 12874 (NOTLB)
d5f05f9c 00000086 bf5ff664 d5f05f6c 00000014 d6640530 bf5ff6f4 d6640530
00000000 c13f4e80 c13f4520 00000000 000034bf 70cc77ad 000002f2 d6640530
d4466020 d4466144 00000000 bf5ffbe0 d5f04000 bf5ffbe0 00000000 d5f04000 Call Trace:
[<c01020c2>] sys_rt_sigsuspend+0xb4/0xcf [<c0102d91>] syscall_call+0x7/0xb
===============================


When we do a iostat, the following is the output:

===================================================
# iostat -x /dev/sdb 1
Linux 2.6.12.6 ((none)) 04/11/23

avg-cpu: %user %nice %sys %idle
0.43 0.00 0.63 98.95

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util

avg-cpu: %user %nice %sys %idle
nan nan nan nan

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util

avg-cpu: %user %nice %sys %idle
nan nan nan nan

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util

avg-cpu: %user %nice %sys %idle
nan nan nan nan

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util

====================================================


Thanks in advance,
A.govind


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/