Re: stochastic fair queueing in the elevator [Re: [BENCHMARK] 2.4.20-ck3 / aa / rmap with contest]

From: Nick Piggin (
Date: Sun Feb 09 2003 - 23:44:35 EST

Rik van Riel wrote:

>On Mon, 10 Feb 2003, Nick Piggin wrote:
>>Andrea Arcangeli wrote:
>>>The only way to get the minimal possible latency and maximal fariness is
>>>my new stochastic fair queueing idea.
>>Sounds nice. I would like to see per process disk distribution.
>Sounds like the easiest way to get that fair, indeed. Manage
>every disk as a separately scheduled resource...
I hope this option becomes available one day.

>>However dependant reads can not merge with each other obviously so
>>you could end up say submitting 4K reads per process.
>Considering that one medium/far disk seek counts for about 400 kB
>of data read/write, I suspect we'll just want to merge requests or
>put adjacant requests next to each other into the elevator up to
>a fairly large size. Probably about 1 MB for a hard disk or a cdrom,
>but much less for floppies, opticals, etc...
Yes, but the point is _dependant reads_. This is why Andrea's approach
alone can't solve dependant read vs write or nondependant read - while
maintaining a good throughput.

>>But your solution also does nothing for sequential IO throughput in
>>the presence of more than one submitter.
>>I think you should be giving each process a timeslice,
>That is the anticipatory scheduler. A good complement to the SFQ
>part of the IO scheduler. I'd really like to see both ideas together
>in one scheduler, they sound like a winning pair.
No, anticipatory scheduling just "pauses for a bit after some reads"
I am talking about all this nonsense about trying to measure seek vs
throughput vs rotational latency, etc. We don't process schedule by
how many memory accesses a process makes * processor speed / memory
speed + instruction jumps.....

If you want a fair disk scheduler, just give each process a disk
timeslice, and all your seek, stream, settle, track buffer, etc
accounting is magically done for you... For any device, and is
100% tunable. The point is, account by _time_.


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at
Please read the FAQ at

This archive was generated by hypermail 2b29 : Sat Feb 15 2003 - 22:00:25 EST