Re: [patch 0/4] [RFC] Another proportional weight IO controller

From: Fabio Checconi
Date: Thu Nov 20 2008 - 01:53:20 EST


> From: Aaron Carroll <aaronc@xxxxxxxxxxxxxxxxxx>
> Date: Thu, Nov 20, 2008 03:45:02PM +1100
>
> Fabio Checconi wrote:
> >>> Fabio Checconi wrote:
> >>>> - To detect hw tagging in BFQ we consider a sample valid iff the
> >>>> number of requests that the scheduler could have dispatched (given
> >>>> by cfqd->rb_queued + cfqd->rq_in_driver, i.e., the ones still into
> >>>> the scheduler plus the ones into the driver) is higher than the
> >>>> CFQ_HW_QUEUE_MIN threshold. This obviously caused no problems
> >>>> during testing, but the way CFQ uses now seems a little bit
> >>>> strange.
> >>> BFQ's tag detection logic is broken in the same way that CFQ's used to
> >>> be. Explanation is in this patch:
> >>>
> >> If you look at bfq_update_hw_tag(), the logic introduced by the patch
> >> you mention is still there; BFQ starts with ->hw_tag = 1, and updates it
>
> Yes, I missed that. So which part of CFQ's hw_tag detection is strange?
>

I just think that is rather counterintuitive to consider invalid
a sample when you have, say, rq_in_driver = 1,2,3 or 4 and other
4 queued requests. Considering the actual number of requests that
could have been dispatched seemed more straightforward than
considering the two values separately.

Anyway I think the validity of the samples is a minor issue, while the
throughput loss you experienced was a more serious one.


> >> every 32 valid samples. What changed WRT your patch, apart from the
> >> number of samples, is that the condition for a sample to be valid is:
> >>
> >> bfqd->rq_in_driver + bfqd->queued >= 5
> >>
> >> while in your patch it is:
> >>
> >> cfqd->rq_queued > 5 || cfqd->rq_in_driver > 5
> >>
> >> We preferred the first one because that sum better reflects the number
> >> of requests that could have been dispatched, and I don't think that this
> >> is wrong.
>
> I think it's fine too. CFQ's condition accounts for a few rare situations,
> such as the device stalling or hw_tag being updated right after a bunch of
> requests are queued. They are probably irrelevant, but can't hurt.
>
> >> There is a problem, but it's not within the tag detection logic itself.
> >> From some quick experiments, what happens is that when a process starts,
> >> CFQ considers it seeky (*), BFQ doesn't. As a side effect BFQ does not
> >> always dispatch enough requests to correctly detect tagging.
> >>
> >> At the first seek you cannot tell if the process is going to bee seeky
> >> or not, and we have chosen to consider it sequential because it improved
> >> fairness in some sequential workloads (the CIC_SEEKY heuristic is used
> >> also to determine the idle_window length in [bc]fq_arm_slice_timer()).
> >>
> >> Anyway, we're dealing with heuristics, and they tend to favor some
> >> workload over other ones. If recovering this thoughput loss is more
> >> important than a transient unfairness due to short idling windows assigned
> >> to sequential processes when they start, I've no problems in switching
> >> the CIC_SEEKY logic to consider a process seeky when it starts.
> >>
> >> Thank you for testing and for pointing out this issue, we missed it
> >> in our testing.
> >>
> >>
> >> (*) to be correct, the initial classification depends on the position
> >> of the first accessed sector.
> >
> > Sorry, I forgot the patch... This seems to solve the problem with
> > your workload here, does it work for you?
>
> Yes, it works fine now :)
>

Thank you very much for trying it.


> However, hw_tag detection (in CFQ and BFQ) is still broken in a few ways:
> * If you go from queue_depth=1 to queue_depth=large, it's possible that
> the detection logic fails. This could happen if setting queue_depth
> to a larger value at boot, which seems a reasonable situation.

I think that the transition of hw_tag from 1 to 0 can be quite easy, and
may depend only on the workload, while getting back to 1 is more difficult,
because when hw_tag is 0 there may be too few dispatches to detect queueing...


> * It depends too much on the hardware. If you have a seekly load on a
> fast disk with a unit queue depth, idling sucks for performance (I
> imagine this is particularly bad on SSDs). If you have any disk with
> a deep queue, not idling sucks for fairness.

Agreed. This fairness vs. throughput conflict is very workload
dependent too.


> I suppose CFQ's slice_resid is supposed to help here, but as far as I can
> tell, it doesn't do a thing.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/