Re: stochastic fair queueing in the elevator [Re: [BENCHMARK] 2.4.20-ck3 / aa / rmap with contest]

From: Nick Piggin (
Date: Mon Feb 10 2003 - 07:11:01 EST

Andrea Arcangeli wrote:

>On Mon, Feb 10, 2003 at 10:45:59PM +1100, Nick Piggin wrote:
>>perspective it does nullify the need for readahead (though
>>it is obivously still needed for other reasons).
>I'm guessing that physically it may be needed from a head prospective
>too, I doubt it only has to do with the in-core overhead. Seeing it all
>before reaching the seek point might allow the disk to do smarter things
>and to keep the head at the right place for longer, dunno. Anyways,
>whatever is the reason it doesn't make much difference from our point of
>view ;), but I don't expect this hardware behaviour changing in future
>high end storage.
I don't understand it at all. I mean there is no other IO going
on so there would be no reason for the disk head to move anyway.
Rotational latency should be basically non-existant due to track
buffers, and being FC RAID hardware you wouldn't expect them to
skimp on anything.

>NOTE: just to be sure, I'm not at all against anticpiatory scheduling,
>it's clearly a very good feature to have (still I would like an option
>to disable it especially in heavy async environments like databases,
>where lots of writes are sync too) but it should be probably be enabled
>by default, especially for the metadata reads that have to be
>synchronous by design.
Yes it definately has to be selectable (in fact, in my current
version, setting antic_expire = 0 disables it), and Andrew has
been working on tuning the non anticipatory version into shape.

>Infact I wonder that it may be interesting to also make it optionally
>controlled from a fs hint (of course we don't pretend all fs to provide
>the hint), so that you stall I/O writes only when you know for sure
>you're going to submit another read in a few usec, just the time to
>parse the last metadata you read. Still a timeout would be needed for
>scheduling against RT etc..., but it could be a much more relaxed
>timeout with this option enabled, so it would need less accurate
>timings, and it would be less dependent on hardware, and it would
>be less prone to generate false positive stalls. The downside is having
>to add the hints.
It would be easy to anticipate or not based on hints. We could
anticipate sync writes if we wanted, lower expire time for sync
writes, increase it for async reads. It is really not very
complex (although the code needs tidying up).

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at
Please read the FAQ at

This archive was generated by hypermail 2b29 : Sat Feb 15 2003 - 22:00:27 EST