Re: question about IO-sched

From: Corrado Zoccolo
Date: Wed Jul 18 2012 - 02:51:19 EST

On Sun, Jul 15, 2012 at 9:08 AM, gaoqiang <gaoqiangscut@xxxxxxxxx> wrote:
> many thanks. but why the sys_read operation hangs on sync_page ? there are
> still
> many free memory.I mean ,the actually free memory,excluding the various
> kinds of
> caches or buffers. explains sync_page:
> ->sync_page() is an awful misnomer. Usually, when page IO operation is
> requested by calling ->writepage() or ->readpage(), file-system queues
> IO request (e.g., disk-based file system may do this my calling
> submit_bio()), but underlying device driver does not proceed with this
> IO immediately, because IO scheduling is more efficient when there are
> multiple requests in the queue.
> Only when something really wants to wait for IO completion
> (wait_on_page_{locked,writeback}() are used to wait for read and write
> completion respectively) IO queue is processed. To do this
> wait_on_page_bit() calls ->sync_page() (see block_sync_page()---standard
> implementation of ->sync_page() for disk-based file systems).
> So, semantics of ->sync_page() are roughly "kick underlying storage
> driver to actually perform all IO queued for this page, and, maybe, for
> other pages on this device too".

It is expected that sys_read will wait until the data is available for
the process.
If you don't want to wait (because you can do other stuff in the mean
time, including queuing other I/O operations), you can use aio_read.
The kernel will notify your process when the operation completes and
the data is available in memory.


> å Fri, 13 Jul 2012 22:15:31 +0800ïCorrado Zoccolo <czoccolo@xxxxxxxxx> åé:
>> Hi,
>> the catch is that writes are "fire and forget", so they keep accumulating
>> in the I/O sched, and there is always plenty of them to schedule (unless
>> you explicitly make sync writes).
>> The reader, instead, waits for the result of each read operation before
>> scheduling a new read, so there is at most one outstanding read, and some
>> time nothing.
>> The deadline scheduler is work conserving, meaning that it never leaves
>> the
>> disk idle when there is work queued, and most of the time after an
>> operation completes, there is only write work queued, so you see much
>> more
>> writes being sent to the device.
>> Only schedulers that delay writes waiting for reads (as Anticipatory in
>> old
>> kernels, and now CFQ) can achieve higher read to write ratios.
>> Cheers
>> Corrado
>> On Thu, Jul 12, 2012 at 11:01 AM, gaoqiang <gaoqiangscut@xxxxxxxxx>
>> wrote:
>>> Hi,all
>>> I have long known that deadline is read-prefered. but a simple
>>> test gives the opposite result.
>>> with two processes running at the same time,one for read and one
>>> for write.actually,they did nothing bug IO operation.
>>> while(true)
>>> {
>>> read();
>>> }
>>> the other:
>>> while(true)
>>> {
>>> write();
>>> }
>>> with deadline IO-sched and ext4 a result, read
>>> ratio was about below 3M/s.and write about 100M/s. I have tested both
>>> kernel-2.6.18 and kernel-2.6.32,getting the same result.
>>> I add some debug information in the kernel and recompile,found
>>> that,it has little to do with IO-sched layer because read request
>>> dropped
>>> into deadline was 5% of write request .from /proc/<pid>/stack,the read
>>> process hands on sync_page most of the time.
>>> what is the matter ? anyone help me ?
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-kernel"
>>> in
>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> More majordomo info at
>>> Please read the FAQ at
> --
> äç Opera éåæççåéäåæçå:


dott. Corrado Zoccolo mailto:czoccolo@xxxxxxxxx
PhD - Department of Computer Science - University of Pisa, Italy
The self-confidence of a warrior is not the self-confidence of the average
man. The average man seeks certainty in the eyes of the onlooker and calls
that self-confidence. The warrior seeks impeccability in his own eyes and
calls that humbleness.
Tales of Power - C. Castaneda
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at