Re: [RFC] [PATCH] cfq-iosched: add cfq group hierarchical scheduling support

From: Nauman Rafique
Date: Wed Sep 01 2010 - 11:49:56 EST


On Wed, Sep 1, 2010 at 1:50 AM, Gui Jianfeng <guijianfeng@xxxxxxxxxxxxxx> wrote:
> Vivek Goyal wrote:
>> On Tue, Aug 31, 2010 at 08:40:19AM -0700, Nauman Rafique wrote:
>>> On Tue, Aug 31, 2010 at 5:57 AM, Vivek Goyal <vgoyal@xxxxxxxxxx> wrote:
>>>> On Tue, Aug 31, 2010 at 08:29:20AM +0800, Gui Jianfeng wrote:
>>>>> Vivek Goyal wrote:
>>>>>> On Mon, Aug 30, 2010 at 02:50:40PM +0800, Gui Jianfeng wrote:
>>>>>>> Hi All,
>>>>>>>
>>>>>>> This patch enables cfq group hierarchical scheduling.
>>>>>>>
>>>>>>> With this patch, you can create a cgroup directory deeper than level 1.
>>>>>>> Now, I/O Bandwidth is distributed in a hierarchy way. For example:
>>>>>>> We create cgroup directories as following(the number represents weight):
>>>>>>>
>>>>>>> Â Â Â Â Â Â Root grp
>>>>>>> Â Â Â Â Â Â/ Â Â Â \
>>>>>>> Â Â Â Âgrp_1(100) grp_2(400)
>>>>>>> Â Â Â Â/ Â Â\
>>>>>>> Â grp_3(200) grp_4(300)
>>>>>>>
>>>>>>> If grp_2 grp_3 and grp_4 are contending for I/O Bandwidth,
>>>>>>> grp_2 will share 80% of total bandwidth.
>>>>>>> For sub_groups, grp_3 shares 8%(20% * 40%), grp_4 shares 12%(20% * 60%)
>>>>>>>
>>>>>>> Design:
>>>>>>> Â o Each cfq group has its own group service tree.
>>>>>>> Â o Each cfq group contains a "group schedule entity" (gse) that
>>>>>>> Â Â schedules on parent cfq group's service tree.
>>>>>>> Â o Each cfq group contains a "queue schedule entity"(qse), it
>>>>>>> Â Â represents all cfqqs located on this cfq group. It schedules
>>>>>>> Â Â on this group's service tree. For the time being, root group
>>>>>>> Â Â qse's weight is 1000, and subgroup qse's weight is 500.
>>>>>>> Â o All gses and qse which belones to a same cfq group schedules
>>>>>>> Â Â on the same group service tree.
>>>>>> Hi Gui,
>>>>>>
>>>>>> Thanks for the patch. I have few questions.
>>>>>>
>>>>>> - So how does the hierarchy look like, w.r.t root group. Something as
>>>>>> Â follows?
>>>>>>
>>>>>>
>>>>>> Â Â Â Â Â Â Â Â Â Â root
>>>>>> Â Â Â Â Â Â Â Â Â Â/ | \
>>>>>> Â Â Â Â Â Â Â Â Âq1 Âq2 G1
>>>>>>
>>>>>> Assume there are two processes doin IO in root group and q1 and q2 are
>>>>>> cfqq queues for those processes and G1 is the cgroup created by user.
>>>>>>
>>>>>> If yes, then what algorithm do you use to do scheduling between q1, q2
>>>>>> and G1? IOW, currently we have two algorithms operating in CFQ. One for
>>>>>> cfqq and other for groups. Group algorithm does not use the logic of
>>>>>> cfq_slice_offset().
>>>>> Hi Vivek,
>>>>>
>>>>> This patch doesn't break the original sheduling logic. That is cfqg => st => cfqq.
>>>>> If q1 and q2 in root group, I treat q1 and q2 bundle as a queue sched entity, and
>>>>> it will schedule on root group service with G1, as following:
>>>>>
>>>>> Â Â Â Â Â Â Â Â Â Â Â Â Âroot group
>>>>> Â Â Â Â Â Â Â Â Â Â Â Â / Â Â Â Â \
>>>>> Â Â Â Â Â Â Â Â Â Â qse(q1,q2) Â Âgse(G1)
>>>>>
>>>> Ok. That's interesting. That raises another question that how hierarchy
>>>> should look like. IOW, how queue and groups should be treated in
>>>> hierarchy.
>>>>
>>>> CFS cpu scheduler treats queues and group at the same level. That is as
>>>> follows.
>>>>
>>>> Â Â Â Â Â Â Â Â Â Â Â Âroot
>>>> Â Â Â Â Â Â Â Â Â Â Â Â/ | \
>>>> Â Â Â Â Â Â Â Â Â Â Â q1 q2 G1
>>>>
>>>> In the past I had raised this question and Jens and corrado liked treating
>>>> queues and group at same level.
>>>>
>>>> Logically, q1, q2 and G1 are all children of root, so it makes sense to
>>>> treat them at same level and not group q1 and q2 in to a single entity and
>>>> group.
>>>>
>>>> One of the possible way forward could be this.
>>>>
>>>> - Treat queue and group at same level (like CFS)
>>>>
>>>> - Get rid of cfq_slice_offset() logic. That means without idling on, there
>>>> Âwill be no ioprio difference between cfq queues. I think anyway as of
>>>> Âtoday that logic helps in so little situations that I would not mind
>>>> Âgetting rid of it. Just that Jens should agree to it.
>>>>
>>>> - With this new scheme, it will break the existing semantics of root group
>>>> Âbeing at same level as child groups. To avoid that, we can probably
>>>> Âimplement two modes (flat and hierarchical), something similar to what
>>>> Âmemory cgroup controller has done. May be one tunable in root cgroup of
>>>> Âblkio "use_hierarchy". ÂBy default everything will be in flat mode and
>>>> Âif user wants hiearchical control, he needs to set user_hierarchy in
>>>> Âroot group.
>>> Vivek, may be I am reading you wrong here. But you are first
>>> suggesting to add more complexity to treat queues and group at the
>>> same level. Then you are suggesting add even more complexity to fix
>>> the problems caused by that approach.
>>>
>>> Why do we need to treat queues and group at the same level? "CFS does
>>> it" is not a good argument.
>>
>> Sure it is not a very good argument but at the same time one would need
>> a very good argument that why we should do things differently.
>>
>> - If a user has mounted cpu and blkio controller together and both the
>> Â controllers are viewing the same hierarchy differently, then it is
>> Â odd. We need a good reason that why different arrangement makes sense.
>
> Hi Vivekï
>
> Even if we mount cpu and blkio together, to me, it's ok for cpu and blkio
> having their own logic, since they are totally different cgroup subsystems.
>
>>
>> - To me, both group and cfq queue are children of root group and it
>> Â makes sense to treat them independent childrens instead of putting
>> Â all the queues in one logical group which inherits the weight of
>> Â parent.
>>
>> - With this new scheme, I am finding it hard to visualize the hierachy.
>> Â How do you assign the weights to queue entities of a group. It is more
>> Â like a invisible group with-in group. We shall have to create new
>> Â tunable which can speicy the weight for this hidden group.
>
> For the time being, the root "qse" weight is 1000 and others is 500, they don't
> inherit the weight of parent. I was thinking that maybe we can determine the qse
> weight in term of the queue number and weight in this group and subgroups.
>
> Thanks,
> Gui
>
>>
>>
>> So in summary I am liking the "queue at same level as group" scheme for
>> two reasons.
>>
>> - It is more intutive to visualize and implement. It follows the true
>> Â hierarchy as seen by cgroup file system.
>>
>> - CFS has already implemented this scheme. So we need a strong arguemnt
>> Â to justify why we should not follow the same thing. Especially for
>> Â the case where user has co-mounted cpu and blkio controller.
>>
>> - It can achieve the same goal as "hidden group" proposal just by
>> Â creating a cgroup explicitly and moving all threads in that group.
>>
>> Why do you think that "hidden group" proposal is better than "treating
>> queue at same level as group" ?

There are multiple reasons for "hidden group" proposal being a better approach.

- "Hidden group" would allow us to keep scheduling queues using the
CFQ queue scheduling logic. And does not require any major changes in
CFQ. Aren't we already using that approach to deal with queues at the
root group?

- If queues and groups are treated at the same level, queues can end
up in root cgroup. And we cannot put an upper bound on the number of
those queues. Those queues can consume system resources in proportion
to their number, causing the performance of groups to suffer. If we
have "hidden group", we can configure it to a small weight, and that
would limit the impact these queues in root group can have.

>>
>> Thanks
>> Vivek
>>
>>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/