[RFC] cfq: adapt slice to number of processes doing I/O

From: Corrado Zoccolo
Date: Thu Sep 03 2009 - 07:07:16 EST

When the number of processes performing I/O concurrently increases, a
fixed time slice per process will cause large latencies.
In the patch, if there are more than 3 processes performing concurrent
I/O, we scale the time slice down proportionally.
To safeguard sequential bandwidth, we impose a minimum time slice,
computed from cfq_slice_idle (the idea is that cfq_slice_idle
approximates the cost for a seek).

I performed two tests, on a rotational disk:
* 32 concurrent processes performing random reads
** the bandwidth is improved from 466KB/s to 477KB/s
** the maximum latency is reduced from 7.667s to 1.728
* 32 concurrent processes performing sequential reads
** the bandwidth is reduced from 28093KB/s to 24393KB/s
** the maximum latency is reduced from 3.781s to 1.115s

I expect numbers to be even better on SSDs, where the penalty to
disrupt sequential read is much less.

Signed-off-by: Corrado Zoccolo <czoccolo@gmail-com>

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index fd7080e..cff4ca8 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -306,7 +306,15 @@ cfq_prio_to_slice(struct cfq_data *cfqd, struct
cfq_queue *cfqq)
static inline void
cfq_set_prio_slice(struct cfq_data *cfqd, struct cfq_queue *cfqq)
- cfqq->slice_end = cfq_prio_to_slice(cfqd, cfqq) + jiffies;
+ unsigned low_slice = cfqd->cfq_slice_idle * (1 + cfq_cfqq_sync(cfqq));
+ unsigned interested_queues = cfq_class_rt(cfqq) ?
cfqd->busy_rt_queues : cfqd->busy_queues;
+ unsigned slice = cfq_prio_to_slice(cfqd, cfqq);
+ if (interested_queues > 3) {
+ slice *= 3;
+ slice /= interested_queues;
+ }
+ cfqq->slice_end = jiffies + max(slice, low_slice);
cfq_log_cfqq(cfqd, cfqq, "set_slice=%lu", cfqq->slice_end - jiffies);

Attachment: test2f.fio
Description: Binary data

Attachment: cfq_orig.out
Description: Binary data

Attachment: cfq_adapt.out
Description: Binary data