Re: [PATCH 02/20] blkio: Change CFQ to use CFS like queue time stamps

From: Corrado Zoccolo
Date: Thu Nov 05 2009 - 03:27:40 EST

Hi Vivek,
let me answer all your questions in a single mail.

On Thu, Nov 5, 2009 at 12:22 AM, Vivek Goyal <vgoyal@xxxxxxxxxx> wrote:
> Hi Corrado,
> Had one more question. Now with dynamic slice length (reduce slice length
> to meet target latency), don't wee see reduced throughput on rotational
> media with sequential workload?
Yes. This is the main reason for disabling dynamic slice length when
low_latency is not set. In this way, on servers where low latency is
not a must (but still desirable), this feature can be disabled, while
the others, that have positive impact on throughput, will not be

> I saw some you posted numbers for SSD. Do you have some numbers for
> rotational media also?
Yes. I posted it in the first RFC for this patch, outside the series:

The other patches in the series do not affect sequential bandwidth,
but can improve random read BW in case of NCQ hardware, regardless of
it being rotational, SSD, or SAN.

> I am looking at your patchset and trying to understand how have you
> ensured fairness for different priority level queues.
> Following seems to be the key piece of code which determines the slice
> length of the queue dynamically.
> static inline void
> cfq_set_prio_slice(struct cfq_data *cfqd, struct cfq_queue *cfqq)
> { [snipped code] }
> A question.
> - expect_latency seems to be being calculated based on based slice lenth
> for sync queues (100ms). This will give right number only if all the
> queues in the system were of prio 4. What if there are 3 prio 0 queues.
> They will/should get 180ms slice each resulting in max latency of 540 ms
> but we will be calculating expect_latency to = 100 * 3 =300 ms which is
> less than cfq_target_latency and we will not adjust slice length?
Yes. Those are soft latencies, so we don't *guarantee* 300ms. On an
average system, where the average slice length is 100ms, we will go
pretty close (but since CFQ doesn't count the first seek in the time
slice, we can still be some tenths of ms off), but if you have a
different distribution of priorities, then this will not be

> - With "no-idle" group, who benefits? As I said, all these optimizations
> seems to be for low latency. In that case user will set "low_latency"
> tunable in CFQ. If that's the case, then we will anyway enable idling
> random seeky processes having think time less than 8ms. So they get
> their fair share.
My patch changes the meaning for low_latency. As we discussed some
months ago, I always thought that the solution of idling for seeky
processes was sub-optimal. With the new code, regardless of
low_latency settings, we won't idle between 'no-idle' queues. We will
idle only at the end of the no-idle tree, if we still have not reached
workload_expires. This provides fairness between 'no-idle' and normal
sync queues.
> I guess this will provide benefit if user has not set "low_latency" and
> in that case we will not enable idle on random seeky readers and we will
> gain in terms of throughput on NCQ hardware because we dispatch from
> other no-idle queues and then idle on the no-idle group.
It will improve both latency and bandwidth, and as I said, it is now
not limited to just low_latency not set. After my patch series,
low_latency will control just 2 things:
* the dynamic timeslice adaption
* the dynamic threshold for number of writes dispatched

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at