Re: [patch]cfq-iosched: delete deep seeky queue idle logic

From: Shaohua Li
Date: Sun Sep 25 2011 - 20:50:43 EST


On Fri, 2011-09-23 at 13:50 +0800, Corrado Zoccolo wrote:
> Il giorno 21/set/2011 13:16, "Shaohua Li" <shaohua.li@xxxxxxxxx> ha
> scritto:
> >
> > On Sat, 2011-09-17 at 03:25 +0800, Corrado Zoccolo wrote:
> > > On Fri, Sep 16, 2011 at 8:40 AM, Shaohua Li <shaohua.li@xxxxxxxxx>
> wrote:
> > > > On Fri, 2011-09-16 at 14:04 +0800, Corrado Zoccolo wrote:
> > > >> On Fri, Sep 16, 2011 at 5:09 AM, Shaohua Li
> <shaohua.li@xxxxxxxxx> wrote:
> > > >> > Recently Maxim and I discussed why his aiostress workload
> performs poorly. If
> > > >> > you didn't follow the discussion, here are the issues we
> found:
> > > >> > 1. cfq seeky dection isn't good. Assume a task accesses
> sector A, B, C, D, A+1,
> > > >> > B+1, C+1, D+1, A+2...Accessing A, B, C, D is random. cfq will
> detect the queue
> > > >> > as seeky, but since when accessing A+1, A+1 is already in
> disk cache, this
> > > >> > should be detected as sequential really. Not sure if any real
> workload has such
> > > >> > access patern, and seems not easy to have a clean fix too.
> Any idea for this?
> > > >>
> > > >> Not all disks will cache 4 independent streams, we can't make
> that
> > > >> assumption in cfq.
> > > > sure thing. we can't make such assumption. I'm thinking if we
> should
> > > > move the seeky detection in request finish. If time between two
> requests
> > > > finish is short, we thought the queue is sequential. This will
> make the
> > > > detection adaptive. but seems time measurement isn't easy.
> > > >
> > > >> The current behaviour of assuming it as seeky should work well
> enough,
> > > >> in fact it will be put in the seeky tree, and it can enjoy the
> seeky
> > > >> tree quantum of time. If the second round takes a short time,
> it will
> > > >> be able to schedule a third round again after the idle time.
> > > >> If there are other seeky processes competing for the tree, the
> cache
> > > >> can be cleared by the time it gets back to your 4 streams
> process, so
> > > >> it will behave exactly as a seeky process from cfq point of
> view.
> > > >> If the various accesses were submitted in parallel, the deep
> seeky
> > > >> queue logic should kick in and make sure the process gets a
> sequential
> > > >> quantum, rather than sharing it with other seeky processes, so
> > > >> depending on your disk, it could perform better.
> > > > yes, the idle logic makes it ok, but sounds like "make things
> wrong
> > > > first (in seeky detection) and then fix it later (the idle
> logic)".
> > > >
> > > >> > 2. deep seeky queue idle. This makes raid performs poorly. I
> would think we
> > > >> > revert the logic. Deep queue is more popular with high end
> hardware. In such
> > > >> > hardware, we'd better not do idle.
> > > >> > Note, currently we set a queue's slice after the first
> request is finished.
> > > >> > This means the drive already idles a little time. If the
> queue is truely deep,
> > > >> > new requests should already come in, so idle isn't required.
> > > > What did you think about this? Assume seeky request takes long
> time, so
> > > > the queue is already idling for a little time.
> > > I don't think I understand. If cfq doesn't idle, it will dispatch
> an
> > > other request from the same or an other queue (if present)
> > > immediately, until all possible in-flight requests are sent. Now,
> you
> > > depend on NCQ for the order requests are handled, so you cannot
> > > guarantee fairness any more.
> > >
> > > >
> > > >> > Looks Vivek used to post a patch to rever it, but it gets
> ignored.
> > > >> >
> http://us.generation-nt.com/patch-cfq-iosched-revert-logic-deep-queues-help-198339681.html
> > > >> I get a 404 here. I think you are seeing only one half of the
> medal.
> > > >> That logic is there mainly to ensure fairness between deep
> seeky
> > > >> processes and normal seeky processes that want low latency.
> > > > didn't understand it. The logic doesn't protect non-deep
> process. how
> > > > could it make the normal seeky process have low latency? or did
> you have
> > > > a test case for this, so I can analyze?
> > > > I tried a workload with one task drives depth 4 and one task
> drives
> > > > depth 16. Appears the behavior isn't changed w/wo the logic.
> > sorry for the delay.
> >
> > > Try a workload with one shallow seeky queue and one deep (16) one,
> on
> > > a single spindle NCQ disk.
> > > I think the behaviour when I submitted my patch was that both were
> > > getting 100ms slice (if this is not happening, probably some
> > > subsequent patch broke it).
> > > If you remove idling, they will get disk time roughly in
> proportion
> > > 16:1, i.e. pretty unfair.
> > I thought you are talking about a workload with one thread depth 4,
> and
> > the other thread depth 16. I did some tests here. In an old kernel,
> > without the deep seeky idle logic, the threads have disk time in
> > proportion 1:5. With it, they get almost equal disk time. SO this
> > reaches your goal. In a latest kernel, w/wo the logic, there is no
> big
> > difference (the 16 depth thread get about 5x more disk time). With
> the
> > logic, the depth 4 thread gets equal disk time in first several
> slices.
> > But after an idle expiration(mostly because current block plug hold
> > requests in task list and didn't add them to elevator), the queue
> never
> > gets detected as deep, because the queue dispatch request one by
> one. So
> > the logic is already broken for some time (maybe since block plug is
> > added).
> Could be that dispatching requests one by one is harming the
> performance, then?
Not really. Say 4 requests are running, the task dispatches a request
after one previous request is completed. requests are dispatching one by
one but there are still 4 requests running at any time. Checking the
in_flight requests are more precise for the deep detection.

Thanks,
Shaohua

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/