Re: [PATCH 4/4] cfq-iosched: fix corner cases in idling logic

From: Vivek Goyal
Date: Wed Dec 02 2009 - 10:02:44 EST


On Wed, Dec 02, 2009 at 09:47:59AM -0500, Jeff Moyer wrote:
> Vivek Goyal <vgoyal@xxxxxxxxxx> writes:
>
> > On Wed, Dec 02, 2009 at 03:14:22PM +0100, Corrado Zoccolo wrote:
> >> Hi Jeff,
> >> On Wed, Dec 2, 2009 at 2:42 PM, Jeff Moyer <jmoyer@xxxxxxxxxx> wrote:
> >> > Corrado Zoccolo <czoccolo@xxxxxxxxx> writes:
> >> >
> >> >> Idling logic was disabled in some corner cases, leading to unfair share
> >> >> for noidle queues.
> >> >> * the idle timer was not armed if there were other requests in the
> >> >>   driver. unfortunately, those requests could come from other workloads,
> >> >>   or queues for which we don't enable idling. So we will check only
> >> >>   pending requests from the active queue
> >> >> * rq_noidle check on no-idle queue could disable the end of tree idle if
> >> >>   the last completed request was rq_noidle. Now, we will disable that
> >> >>   idle only if all the queues served in the no-idle tree had rq_noidle
> >> >>   requests.
> >> >>
> >> >> Reported-by: Vivek Goyal <vgoyal@xxxxxxxxxx>
> >> >> Signed-off-by: Corrado Zoccolo <czoccolo@xxxxxxxxx>
> >> >
> >> >> @@ -2606,17 +2608,27 @@ static void cfq_completed_request(struct request_queue *q, struct request *rq)
> >> >>                       cfq_clear_cfqq_slice_new(cfqq);
> >> >>               }
> >> >>               /*
> >> >> -              * If there are no requests waiting in this queue, and
> >> >> -              * there are other queues ready to issue requests, AND
> >> >> -              * those other queues are issuing requests within our
> >> >> -              * mean seek distance, give them a chance to run instead
> >> >> -              * of idling.
> >> >> +              * Idling is not enabled on:
> >> >> +              * - expired queues
> >> >> +              * - idle-priority queues
> >> >> +              * - async queues
> >> >> +              * - queues with still some requests queued
> >> >> +              * - when there is a close cooperator
> >> >>                */
> >> >
> >> > I'm not sure this logic is correct.  Is this for the 2.6.33 branch?
> >> Yes.
> >> > If so, the coop flag now means that multiple processes share the same
> >> > cfqq.  Are you sure this is the right thing to do for close cooperators?
> >> I'm not sure. I didn't change the logic for close cooperators:
>
> Heh, right you are.
>
> >> - else if (cfqq_empty && !cfq_close_cooperator(cfqd, cfqq) &&
> >> - sync && !rq_noidle(rq))
> >> - cfq_arm_slice_timer(cfqd);
> >> + else if (sync && cfqq_empty &&
> >> + !cfq_close_cooperator(cfqd, cfqq)) {
> >> + cfqd->noidle_tree_requires_idle |= !rq_noidle(rq);
> >>
> >> I changed the rq_noidle part, and rewrote the comment to be aligned
> >> with the code.
> >> So I don't mind if you improve (or just remove) the close cooperator part.
> >> Probably, you should do a test where close cooperating processes are competing
> >> with a sequential reader, to see the effect of idling or not on them.
> >>
> >
> > I also can't find what's wrong with this. Previously we were not merging
> > close cooperators in a single queue. So if we found a close cooperator
> > we chose to not idle and move to that close cooperator. Now we try to
> > merge all the close cooperators in a single queue. But that merging has
> > not taken place yet and will happen when next request comes.
>
> The coop flag is not set until the merge has taken place.
>
> > A normal sequential reader will not find the close cooperator. Only the
> > queues which should be merged will find the close cooperator. If anyway
> > these queues are going to be merged soon, there is proably no point in
> > continuing to idle on this queue in case we found a close cooperator.
> >
> > So, to me even in new code by jeff, it probably is fine to continue with
> > policy of not idling if we found a close cooperator.
>
> That would mean changing the check from cfqq_coop to cfqq->new_queue !=
> NULL.

Does it make a big difference. cfq_close_cooperator() does not seem to be
relying on coop flag. It will return us a queue if it thinks there is a
close cooperator. (Irrespective of the fact whether cfqq->new_cfqq has bee
setup yet or not). IIUC, cfqq->new_cfqq will be set in select_queue(). So
in case select_queue() has not run yet, then cfqq->new_cfqq = NULL but we
have a close cooperator.

But I guess this condition will not hit many a times as select_queue()
happens very frequently on NCQ hardware and the moment select queue finds
close cooperator it will expire the current queue and above check will not
even get a chance to turn.

So IIUC, if we are here cfqq->new_cfqq is always NULL otherwise select_queue()
by now must have expired us and we will not be here. So either we can
completely remove the check or we can just continue with above check.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/