Re: [PATCH] cfq-iosched: rework seeky detection

From: Corrado Zoccolo
Date: Mon Jan 11 2010 - 09:46:32 EST

On Mon, Jan 11, 2010 at 2:47 AM, Shaohua Li <> wrote:
> On Sat, Jan 09, 2010 at 11:59:17PM +0800, Corrado Zoccolo wrote:
>> Current seeky detection is based on average seek lenght.
>> This is suboptimal, since the average will not distinguish between:
>> * a process doing medium sized seeks
>> * a process doing some sequential requests interleaved with larger seeks
>> and even a medium seek can take lot of time, if the requested sector
>> happens to be behind the disk head in the rotation (50% probability).
>> Therefore, we change the seeky queue detection to work as follows:
>> * each request can be classified as sequential if it is very close to
>> Â the current head position, i.e. it is likely in the disk cache (disks
>> Â usually read more data than requested, and put it in cache for
>> Â subsequent reads). Otherwise, the request is classified as seeky.
>> * an history window of the last 32 requests is kept, storing the
>> Â classification result.
>> * A queue is marked as seeky if more than 1/8 of the last 32 requests
>> Â were seeky.
>> This patch fixes a regression reported by Yanmin, on mmap 64k random
>> reads.
> Can we not count a big request (say the request data is >= 32k) as seeky
> regardless the seek distance? In this way we can also make a 64k random sync
> read not as seeky.
I think I understand what you are proposing, but I don't think request
size should
matter at all for rotational disk.
Usually, the disk firmware will load a big chunk of data in its cache even when
requested to read a single sector, and will provide following ones
from the cache
if you read them sequentially.

Now, in CFQ, what we really mean by saying that a queue is seeky is that
waiting a bit in order to serve an other request from this queue doesn't
give any benefit w.r.t. switching to an other queue.

So, if you read a single 64k block from disk and then seek, then you can service
any other request without losing bandwidth.
Instead, if you are reading 4k, then the next ones (and so on up to 64k, as it
happens with mmap when you fault in a single page at a time), then it
is convenient
to wait for the next request, since it has 3/4 of changes to be
sequential, so be
serviced by cache.

I'm currently testing a patch to consider request size in SSDs, instead.
In SSDs, the location of the request doesn't mean anything, but the
size is meaningful.
Therefore, submitting together many small requests from different
queues can improve
the overall performance.


> Thanks,
> Shaohua
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at