Queuing of disk writes

From: Charles Samuels
Date: Fri Apr 01 2011 - 16:05:10 EST


Kernel hackers,

I have an application that is writing large amounts of very fragmented data to
harddrives. That is, I could write megabytes of data in blocks of a few bytes
scattered around a multi-gigabyte file.

Obviously, doing this causes the harddrive to seek a lot and takes a while.
>From what I understand, if I allow linux to cache the writes, it will fill up
the kernel's write cache, and then consequently the disk drive's DMA queue. As
a result of that, the harddrive can pick the correct order to do these writes,
significantly reducing seek times.

However, there's a major cost in allowing the write cache to fill: fsync takes
*ages*. What's worse is that while fsync is proceeding, it seems *all* disk
operations in the OS are blocked. This is really terrible for performance of
my application: my application might want to do some reads (i.e. from another
thread) from the disk preempting the fsync temporarily. It's also really
terrible for me, because then my workstation becomes unresponsive for several
minutes.

My general question is how to mitigate this. Is it possible to get a signal
for when a file is out of the disk cache. Or can I ask linux approximately how
much data is in the write queue for that specific file, and just do a sleep()-
loop checking until it goes down to something managable at which point I do
the fsync? Or, does aio support this scenario well, and if so, from what
version of Linux? (I've determined that there are some scenarios in which it
does, but it still requires O_DIRECT, apparently, which is weird considering
how I've heard Linux kernel hackers feel about that particular flag).

And yes, I *know* fsync is a poor method to determine if data is actually
committed to something non-volatile. :)

Thanks for the help,

Charles

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/