Re: [PATCH v3 3/7] sched: throttle cfs_rq entities which exceed theirlocal quota
From: Paul Turner
Date: Wed Oct 13 2010 - 02:53:31 EST
On Tue, Oct 12, 2010 at 11:47 PM, Bharata B Rao
<bharata@xxxxxxxxxxxxxxxxxx> wrote:
> On Tue, Oct 12, 2010 at 11:44:29PM -0700, Paul Turner wrote:
>> On Tue, Oct 12, 2010 at 11:34 PM, KAMEZAWA Hiroyuki
>> <kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote:
>> > On Tue, 12 Oct 2010 13:22:02 +0530
>> > Bharata B Rao <bharata@xxxxxxxxxxxxxxxxxx> wrote:
>> >
>> >> sched: throttle cfs_rq entities which exceed their local quota
>> >>
>> >> From: Paul Turner <pjt@xxxxxxxxxx>
>> >>
>> >> In account_cfs_rq_quota() (via update_curr()) we track consumption versus a
>> >> cfs_rq's local quota and whether there is global quota available to continue
>> >> enabling it in the event we run out.
>> >>
>> >> This patch adds the required support for the latter case, throttling entities
>> >> until quota is available to run. Throttling dequeues the entity in question
>> >> and sends a reschedule to the owning cpu so that it can be evicted.
>> >>
>> >> The following restrictions apply to a throttled cfs_rq:
>> >> - It is dequeued from sched_entity hierarchy and restricted from being
>> >> re-enqueued. This means that new/waking children of this entity will be
>> >> queued up to it, but not past it.
>> >> - It does not contribute to weight calculations in tg_shares_up
>> >> - In the case that the cfs_rq of the cpu we are trying to pull from is throttled
>> >> it is is ignored by the loadbalancer in __load_balance_fair() and
>> >> move_one_task_fair().
>> >>
>> >> Signed-off-by: Paul Turner <pjt@xxxxxxxxxx>
>> >> Signed-off-by: Nikhil Rao <ncrao@xxxxxxxxxx>
>> >> Signed-off-by: Bharata B Rao <bharata@xxxxxxxxxxxxxxxxxx>
>> >> ---
>> >> kernel/sched.c | 12 ++++++++
>> >> kernel/sched_fair.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++++----
>> >> 2 files changed, 76 insertions(+), 6 deletions(-)
>> >>
>> >> --- a/kernel/sched.c
>> >> +++ b/kernel/sched.c
>> >> @@ -387,6 +387,7 @@ struct cfs_rq {
>> >> #endif
>> >> #ifdef CONFIG_CFS_BANDWIDTH
>> >> u64 quota_assigned, quota_used;
>> >> + int throttled;
>> >> #endif
>> >> #endif
>> >> };
>> >> @@ -1668,6 +1669,8 @@ static void update_group_shares_cpu(stru
>> >> }
>> >> }
>> >>
>> >> +static inline int cfs_rq_throttled(struct cfs_rq *cfs_rq);
>> >> +
>> >
>> > I just curious that static-inline forward declaration is inlined ?
>> >
>>
>> Hm. This function is tiny, I should just move it up, thanks.
>>
>> >> /*
>> >> * Re-compute the task group their per cpu shares over the given domain.
>> >> * This needs to be done in a bottom-up fashion because the rq weight of a
>> >> @@ -1688,7 +1691,14 @@ static int tg_shares_up(struct task_grou
>> >> usd_rq_weight = per_cpu_ptr(update_shares_data, smp_processor_id());
>> >>
>> >> for_each_cpu(i, sched_domain_span(sd)) {
>> >> - weight = tg->cfs_rq[i]->load.weight;
>> >> + /*
>> >> + * bandwidth throttled entities cannot contribute to load
>> >> + * balance
>> >> + */
>> >> + if (!cfs_rq_throttled(tg->cfs_rq[i]))
>> >> + weight = tg->cfs_rq[i]->load.weight;
>> >> + else
>> >> + weight = 0;
>> >
>> > cpu.share and bandwidth control can't be used simultaneously or...
>> > is this fair ? I'm not familiar with scheduler but this allows boost this tg.
>> > Could you add a brief documentaion of a spec/feature. in the next post ?
>> >
>>
>> Bandwidth control is orthogonal to shares, shares continue controls
>> distribution of bandwidth when within quota. Bandwidth control only
>> has 'perceivable' effect when you exceed your reservation within a
>> quota period.
>
> So if a group gets throttled since its approaching its limit, it might
> not be possible to see perfect fairness b/n groups since bandwidth control
> kind of takes priority.
It's two-fold:
A) The shares will be released so that they can be distributed to
other cpus within the group (where there may be quota remaining)
B) The weight will be hierarchically removed to *improve* fairness
since those entities are not actually running (this is fixing the
accounting, since these calculations are lazy we don't do it at time
of throttle).
It shouldn't have negative implications for group:group fairness.
>
> Regards,
> Bharata.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/