Re: [patch]cfq-iosched: delete deep seeky queue idle logic

From: Shaohua Li
Date: Fri Sep 16 2011 - 02:36:55 EST


On Fri, 2011-09-16 at 14:04 +0800, Corrado Zoccolo wrote:
> On Fri, Sep 16, 2011 at 5:09 AM, Shaohua Li <shaohua.li@xxxxxxxxx> wrote:
> > Recently Maxim and I discussed why his aiostress workload performs poorly. If
> > you didn't follow the discussion, here are the issues we found:
> > 1. cfq seeky dection isn't good. Assume a task accesses sector A, B, C, D, A+1,
> > B+1, C+1, D+1, A+2...Accessing A, B, C, D is random. cfq will detect the queue
> > as seeky, but since when accessing A+1, A+1 is already in disk cache, this
> > should be detected as sequential really. Not sure if any real workload has such
> > access patern, and seems not easy to have a clean fix too. Any idea for this?
>
> Not all disks will cache 4 independent streams, we can't make that
> assumption in cfq.
sure thing. we can't make such assumption. I'm thinking if we should
move the seeky detection in request finish. If time between two requests
finish is short, we thought the queue is sequential. This will make the
detection adaptive. but seems time measurement isn't easy.

> The current behaviour of assuming it as seeky should work well enough,
> in fact it will be put in the seeky tree, and it can enjoy the seeky
> tree quantum of time. If the second round takes a short time, it will
> be able to schedule a third round again after the idle time.
> If there are other seeky processes competing for the tree, the cache
> can be cleared by the time it gets back to your 4 streams process, so
> it will behave exactly as a seeky process from cfq point of view.
> If the various accesses were submitted in parallel, the deep seeky
> queue logic should kick in and make sure the process gets a sequential
> quantum, rather than sharing it with other seeky processes, so
> depending on your disk, it could perform better.
yes, the idle logic makes it ok, but sounds like "make things wrong
first (in seeky detection) and then fix it later (the idle logic)".

> > 2. deep seeky queue idle. This makes raid performs poorly. I would think we
> > revert the logic. Deep queue is more popular with high end hardware. In such
> > hardware, we'd better not do idle.
> > Note, currently we set a queue's slice after the first request is finished.
> > This means the drive already idles a little time. If the queue is truely deep,
> > new requests should already come in, so idle isn't required.
What did you think about this? Assume seeky request takes long time, so
the queue is already idling for a little time.

> > Looks Vivek used to post a patch to rever it, but it gets ignored.
> > http://us.generation-nt.com/patch-cfq-iosched-revert-logic-deep-queues-help-198339681.html
> I get a 404 here. I think you are seeing only one half of the medal.
> That logic is there mainly to ensure fairness between deep seeky
> processes and normal seeky processes that want low latency.
didn't understand it. The logic doesn't protect non-deep process. how
could it make the normal seeky process have low latency? or did you have
a test case for this, so I can analyze?
I tried a workload with one task drives depth 4 and one task drives
depth 16. Appears the behavior isn't changed w/wo the logic.

> If you remove that logic, a single process making many parallel aio
> reads could completely swamp one machine, preventing other seeky
> processes from progressing.
> Instead of removing completely the logic, you should make the depth
> configurable, so multi-spindle storages could allow deeper queues
> before switching to fairness-enforcing policy.
we already had too many tunable ;( And we don't have a way to get the
maximum depth a storage can provide.
Could the driver detect it's a multi-spindle storage and report it to
iosched, just like SSD detection?

Thanks,
Shaohua

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/