Re: [PATCH] cfq: Take whether cfq group is changed into account whenchoosing service tree

From: Gui Jianfeng
Date: Sun Dec 13 2009 - 21:44:14 EST


Vivek Goyal wrote:
> On Fri, Dec 11, 2009 at 07:01:14PM +0100, Corrado Zoccolo wrote:
>> Hi guys,
>> On Fri, Dec 11, 2009 at 4:07 PM, Vivek Goyal <vgoyal@xxxxxxxxxx> wrote:
>>> On Fri, Dec 11, 2009 at 01:02:10PM +0800, Gui Jianfeng wrote:
>>>> Currently, with IO Controller introduced, CFQ chooses cfq group
>>>> at the top, and then choose service tree. So we need to take
>>>> whether cfq group is changed into account to decide whether we
>>>> should choose service tree start from scratch.
>>>>
>>> I am not able to understand the need/purpose of this patch. Once we
>>> switched the group during scheduling, why should we reset the order
>>> of workload with-in group.
>> I understand it, and in fact I was thinking about this.
>> The idea is the same as with priorities. If we have not serviced a group
>> for a while, we want to start with the no-idle workload to reduce its latency.
>>
>> Unfortunately, a group may have a too small share, that could cause some
>> workloads to be starved, as Vivek correctly points out, and we should
>> avoid this.
>> It should be easily reproducible testing a "many groups with mixed workloads"
>> scenario with group_isolation=1.
>>
>> Moreover, even if the approach is groups on top, when group isolation
>> is not set (i.e. the default), in the non-root groups you will only
>> have the sync-idle
>> queues, so it is much more similar (logically) to a workload on top
>> than it appears
>> from the code.
>>
>> I think the net result of this patch is, when group isolation is not
>> set, to select no-idle
>> workload first only when entering the root group, thus a slight
>> penalization of the
>> async workload.
>
> Also penalization of sync-idle workload in root group.
>
> The higher latencies for sync-noidle workload with-in a group will be
> observed only if group_isolation=1 and if group has low weight. I think
> in general this problem should be solved by bumping up the weight of the
> group, otherwise just live with it to ensure fairness for sync-idle
> workload in the group.
>
> With group_isolation=0, the problem should be less visible as root group
> runs with max weight in the system (1000). In general I will assume that
> one will choose system weights in such a manner so that root group does
> not starve.
>
>
>> Gui, if you observed improvements with this patch, probably you can obtain them
>> without the starvation drawback by making it conditional to !group_isolation.
>>
>> BTW, since you always use cfqg_changed and prio_changed OR-ed together, you
>> can have just one argument, called e.g. boost_no_idle, and you pass
>> (!group_isolation && cfqg_changed) || prio_changed.
>>
>
> Because it will impact the share of sync-idle workload in root group and
> asyn workload systemwide, I am not too keen on doing this change. I would
> rather rely on root group being higher weight.
>
> So far this reset of workload order happens only if we decide to higher
> prio class task. So if there is some RT activity happening in system, we
> will constantly be resetting the workload order for BE class and keep on
> servicing sync-noidle workload while starving sync-idle and async
> workload.
>
> So it is probably still ok for priority class switching, but I am not too keen
> on doing this at group switching event. This event will be too frequent if
> there are significant number of groups in the system and we don't want to
> starve root group sync-idle and system wide async workload.
>
> If somebody is not happy with latecies of sync-noidle workload, then
> easier way to solve that issue is readjust weights of groups.
>
> Thanks
> Vivek

Hi Vivek, Corrado

Thanks for Corrado's explanation, i have the same concern.

Consider the following code,

prio_changed = (cfqd->serving_prio != previous_prio);
st = service_tree_for(cfqg, cfqd->serving_prio, cfqd->serving_type,
cfqd);
count = st->count;
/*
* If priority didn't change, check workload expiration,
* and that we still have other queues ready
*/
if (!prio_changed && count &&
!time_after(jiffies, cfqd->workload_expires))
return;

One more thing i do this change is that currently if serving_prio isn't changed,
we'll start from where we left. But with io group introduced, cfqd->serving_prio and
previous_prio might come from different groups. So i don't think "prio_changed" still
makes sense in the case of group changing. cfqd->workload_expires also might come from
the workload from another group when deciding whether starting from "cfqd->serving_type".

Thanks,
Gui

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/