Re: question about IO-sched

From: gaoqiang
Date: Thu Jul 19 2012 - 05:12:45 EST

thanks very much.

在 Wed, 18 Jul 2012 14:51:09 +0800,Corrado Zoccolo <czoccolo@xxxxxxxxx> 写道:

On Sun, Jul 15, 2012 at 9:08 AM, gaoqiang <gaoqiangscut@xxxxxxxxx> wrote:

many thanks. but why the sys_read operation hangs on sync_page ? there are
many free memory.I mean ,the actually free memory,excluding the various
kinds of
caches or buffers. explains sync_page:

->sync_page() is an awful misnomer. Usually, when page IO operation is
requested by calling ->writepage() or ->readpage(), file-system queues
IO request (e.g., disk-based file system may do this my calling
submit_bio()), but underlying device driver does not proceed with this
IO immediately, because IO scheduling is more efficient when there are
multiple requests in the queue.
Only when something really wants to wait for IO completion
(wait_on_page_{locked,writeback}() are used to wait for read and write
completion respectively) IO queue is processed. To do this
wait_on_page_bit() calls ->sync_page() (see block_sync_page()---standard
implementation of ->sync_page() for disk-based file systems).
So, semantics of ->sync_page() are roughly "kick underlying storage
driver to actually perform all IO queued for this page, and, maybe, for
other pages on this device too".

It is expected that sys_read will wait until the data is available for
the process.
If you don't want to wait (because you can do other stuff in the mean
time, including queuing other I/O operations), you can use aio_read.
The kernel will notify your process when the operation completes and
the data is available in memory.


在 Fri, 13 Jul 2012 22:15:31 +0800,Corrado Zoccolo <czoccolo@xxxxxxxxx> 写道:

the catch is that writes are "fire and forget", so they keep accumulating
in the I/O sched, and there is always plenty of them to schedule (unless
you explicitly make sync writes).

The reader, instead, waits for the result of each read operation before
scheduling a new read, so there is at most one outstanding read, and some
time nothing.

The deadline scheduler is work conserving, meaning that it never leaves
disk idle when there is work queued, and most of the time after an
operation completes, there is only write work queued, so you see much
writes being sent to the device.

Only schedulers that delay writes waiting for reads (as Anticipatory in
kernels, and now CFQ) can achieve higher read to write ratios.


On Thu, Jul 12, 2012 at 11:01 AM, gaoqiang <gaoqiangscut@xxxxxxxxx>


I have long known that deadline is read-prefered. but a simple
test gives the opposite result.

with two processes running at the same time,one for read and one
for write.actually,they did nothing bug IO operation.
the other:

with deadline IO-sched and ext4 a result, read
ratio was about below 3M/s.and write about 100M/s. I have tested both
kernel-2.6.18 and kernel-2.6.32,getting the same result.

I add some debug information in the kernel and recompile,found
that,it has little to do with IO-sched layer because read request
into deadline was 5% of write request .from /proc/<pid>/stack,the read
process hands on sync_page most of the time.
what is the matter ? anyone help me ?
To unsubscribe from this list: send the line "unsubscribe linux-kernel"
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at**majordomo-info.html<>

Please read the FAQ at

使用 Opera 革命性的电子邮件客户程序:


dott. Corrado Zoccolo mailto:czoccolo@xxxxxxxxx
PhD - Department of Computer Science - University of Pisa, Italy
The self-confidence of a warrior is not the self-confidence of the average
man. The average man seeks certainty in the eyes of the onlooker and calls
that self-confidence. The warrior seeks impeccability in his own eyes and
calls that humbleness.
Tales of Power - C. Castaneda

使用 Opera 革命性的电子邮件客户程序:
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at