A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

From: Chakri n
Date: Fri Sep 28 2007 - 02:32:53 EST


Hi,

In my testing, a unresponsive file system can hang all I/O in the system.
This is not seen in 2.4.

I started 20 threads doing I/O on a NFS share. They are just doing 4K
writes in a loop.

Now I stop NFS server hosting the NFS share and start a
"dd" process to write a file on local EXT3 file system.

# dd if=/dev/zero of=/tmp/x count=1000

This process never progresses.
There is plenty of HIGH MEMORY available in the system, but this
process never progresses.

# free
total used free
shared buffers cached
Mem: 3238004 609340 2628664 0 15136
551024
-/+ buffers/cache: 43180 3194824
Swap: 4096532 0 4096532

vmstat on the machine:
# vmstat
procs -----------memory---------- ---swap-- -----io---- --system--
-----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 21 0 2628416 15152 551024 0 0 0 0 28 344 0
0 0 100 0
0 21 0 2628416 15152 551024 0 0 0 0 8 340 0
0 0 100 0
0 21 0 2628416 15152 551024 0 0 0 0 26 343 0
0 0 100 0
0 21 0 2628416 15152 551024 0 0 0 0 8 341 0
0 0 100 0
0 21 0 2628416 15152 551024 0 0 0 0 26 357 0
0 0 100 0
0 21 0 2628416 15152 551024 0 0 0 0 8 325 0
0 0 100 0
0 21 0 2628416 15152 551024 0 0 0 0 26 343 0
0 0 100 0
0 21 0 2628416 15152 551024 0 0 0 0 8 325 0
0 0 100 0

The problem seems to be in balance_dirty_pages, which calculates
dirty_thresh based on only ZONE_NORMAL. The same scenario works fine
in 2.4. The dd processes finishes in no time.
NFS file systems can go offline, due to multiple reasons, a failed
switch, filer etc, but that should not effect other file systems in
the machine.
Can this behavior be fenced?, can the buffer cache be tuned so that
other processes do not see the effect?

The following is the back trace of the processes:
--------------------------------------
PID: 3552 TASK: cb1fc610 CPU: 0 COMMAND: "dd"
#0 [f5c04c38] schedule at c0624a34
#1 [f5c04cac] schedule_timeout at c06250ee
#2 [f5c04cf0] io_schedule_timeout at c0624c15
#3 [f5c04d04] congestion_wait at c045eb7d
#4 [f5c04d28] balance_dirty_pages_ratelimited_nr at c045ab91
#5 [f5c04d7c] generic_file_buffered_write at c0457148
#6 [f5c04e10] __generic_file_aio_write_nolock at c04576e5
#7 [f5c04e84] generic_file_aio_write at c0457799
#8 [f5c04eb4] ext3_file_write at f8888fd7
#9 [f5c04ed0] do_sync_write at c0472e27
#10 [f5c04f7c] vfs_write at c0473689
#11 [f5c04f98] sys_write at c0473c95
#12 [f5c04fb4] sysenter_entry at c0404ddf
------------------------------------------
PID: 3091 TASK: cb1f0100 CPU: 1 COMMAND: "test"
#0 [f6050c10] schedule at c0624a34
#1 [f6050c84] schedule_timeout at c06250ee
#2 [f6050cc8] io_schedule_timeout at c0624c15
#3 [f6050cdc] congestion_wait at c045eb7d
#4 [f6050d00] balance_dirty_pages_ratelimited_nr at c045ab91
#5 [f6050d54] generic_file_buffered_write at c0457148
#6 [f6050de8] __generic_file_aio_write_nolock at c04576e5
#7 [f6050e40] enqueue_entity at c042131f
#8 [f6050e5c] generic_file_aio_write at c0457799
#9 [f6050e8c] nfs_file_write at f8f90cee
#10 [f6050e9c] getnstimeofday at c043d3f7
#11 [f6050ed0] do_sync_write at c0472e27
#12 [f6050f7c] vfs_write at c0473689
#13 [f6050f98] sys_write at c0473c95
#14 [f6050fb4] sysenter_entry at c0404ddf

Thanks
--Chakri
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/