Re: [PATCH] Prevent large file writeback starvation

From: Alexander Bergolth
Date: Mon Mar 20 2006 - 17:37:41 EST


On 02/06/06 21:11, Andrew Morton wrote:
Mark Lord <lkml@xxxxxx> wrote:

A simple test I do for this:
$ mkdir t
$ cp /usr/src/*.bz2 t (about 400-500MB worth of kernel tar files)

In another window, I do this:

$ while (sleep 1); do echo -n "`date`: "; grep Dirty /proc/meminfo; done

And then watch the count get large, but take virtually forever
to count back down to a "safe" value.

Typing "sync" causes all the Dirty pages to immediately be flushed to disk,
as expected.

I've never seen that happen and I don't recall seeing any other reports of
it, so your machine must be doing something peculiar.

We are seeing the same issue on several boxes using xfs on some of them and ext3 on the others. Dirty pages are not periodically flushed to disk and even the sync command sometimes does only flush a small amount of the dirty buffers.
It looks like the problems arise after a few days uptime, a freshly booted system doesn't show the symptoms. (At least it doesn't show them when I'm looking out for them. ;))

I've written a small test-script to visualize the behavior. (Attached.)

The script creates a 200MB file, monitors nr_dirty in /proc/vmstat and
executes sync after some time.

The output looks like that:

-------------------- snip! bad: --------------------
Linux slime.wu-wien.ac.at 2.6.14-1.1653_FC4smp #1 SMP Tue Dec 13
21:46:01 EST 2005 i686 i686 i386 GNU/Linux
12:22:46 up 10 days, 18:12, 37 users, load average: 0.17, 0.15, 0.09
12:22:46 start: head -c 200000000 /dev/zero
>/var/tmp/dirty-buffers.EFYFF13399 # nr_dirty 1076
12:22:46 # nr_dirty 1805
12:22:47 end: head -c 200000000 /dev/zero
>/var/tmp/dirty-buffers.EFYFF13399 # nr_dirty 31061
12:22:51 # nr_dirty 25671
12:22:56 # nr_dirty 25724
12:23:01 # nr_dirty 25724
12:23:06 # nr_dirty 25724
12:23:11 # nr_dirty 25724
12:23:16 # nr_dirty 25724
12:23:21 # nr_dirty 25724
12:23:26 # nr_dirty 25724
12:23:31 # nr_dirty 25724
12:23:36 # nr_dirty 25724
12:23:41 # nr_dirty 25724
12:23:47 # nr_dirty 25725
12:23:52 # nr_dirty 25726
12:23:57 # nr_dirty 25728
12:24:02 # nr_dirty 25728
12:24:07 # nr_dirty 25728
12:24:12 # nr_dirty 25728
12:24:12 # nr_dirty 25728
12:24:12 start: sync # nr_dirty 25728
12:24:12 end: sync # nr_dirty 23566
12:24:17 # nr_dirty 23566
12:24:22 # nr_dirty 23582
12:24:27 # nr_dirty 23583
12:24:32 # nr_dirty 23583
12:24:37 # nr_dirty 23583
12:24:42 # nr_dirty 23583
12:24:47 # nr_dirty 23583
12:24:52 # nr_dirty 23583
12:24:57 # nr_dirty 23583
-------------------- snip! --------------------

While writing the temp-file, some buffers are flushed. (31061->25671)
But after writing is completed, the 25000 buffers remain dirty and are
not flushed after 30 secs, as I would expect. The sync causes the dirty
buffers to shrink from 25728 to 23566 but I'd expect that sync should
cause them to become near 0.

Here is the output of another system with a lower uptime that doesn't
show that behavior yet:

-------------------- snip! good: --------------------
Linux roaster.wu-wien.ac.at 2.6.12-1.1376_FC3.stk16smp #1 SMP Mon Aug 29
16:41:37 EDT 2005 i686 i686 i386 GNU/Linux
12:44:54 up 3 days, 1:50, 2 users, load average: 0.00, 0.16, 0.14
12:44:54 start: head -c 200000000 /dev/zero
>/tmp/dirty-buffers.cgRFjZ1720 # nr_dirty 2
12:44:54 # nr_dirty 2
12:44:55 end: head -c 200000000 /dev/zero >/tmp/dirty-buffers.cgRFjZ1720
# nr_dirty 31257
12:44:59 # nr_dirty 22239
12:45:04 # nr_dirty 22239
12:45:09 # nr_dirty 22239
12:45:14 # nr_dirty 22240
12:45:19 # nr_dirty 22240
12:45:24 # nr_dirty 22240
12:45:29 # nr_dirty 4830
12:45:34 # nr_dirty 1
12:45:39 # nr_dirty 1
12:45:44 # nr_dirty 2
12:45:49 # nr_dirty 2
12:45:54 # nr_dirty 2
12:45:59 # nr_dirty 2
12:46:04 # nr_dirty 1
12:46:09 # nr_dirty 1
12:46:14 # nr_dirty 1
12:46:19 # nr_dirty 1
12:46:19 # nr_dirty 1
12:46:19 start: sync # nr_dirty 1
12:46:19 end: sync # nr_dirty 0
12:46:24 # nr_dirty 0
12:46:29 # nr_dirty 0
12:46:34 # nr_dirty 0
12:46:39 # nr_dirty 0
12:46:44 # nr_dirty 0
12:46:49 # nr_dirty 0
12:46:54 # nr_dirty 0
12:46:59 # nr_dirty 1
12:47:04 # nr_dirty 1
-------------------- snip! --------------------

There are no special mount options used in the first example (ext3), noatime is used in the second example (xfs).

Cheers,
--leo

Attachment: dirty-buffers.sh
Description: application/shellscript