Re: [PATCH v3 2/7] sched: accumulate per-cfs_rq cpu usage

From: Nikhil Rao
Date: Wed Oct 13 2010 - 09:46:48 EST

Next message: Mel Gorman: "Re: [UnifiedV4 00/16] The Unified slab allocator (V4)"
Previous message: James Bottomley: "[GIT PULL] SCSI bug fixes for 2.6.36-rc7"
In reply to: Balbir Singh: "Re: [PATCH v3 2/7] sched: accumulate per-cfs_rq cpu usage"
Next in thread: Balbir Singh: "Re: [PATCH v3 2/7] sched: accumulate per-cfs_rq cpu usage"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, Oct 13, 2010 at 6:30 AM, Balbir Singh <balbir@xxxxxxxxxxxxxxxxxx> wrote:
> * Bharata B Rao <bharata@xxxxxxxxxxxxxxxxxx> [2010-10-12 13:21:09]:
>
>> sched: accumulate per-cfs_rq cpu usage
>>
>> From: Paul Turner <pjt@xxxxxxxxxx>
>>
>> Introduce account_cfs_rq_quota() to account bandwidth usage on the cfs_rq
>> level versus task_groups for which bandwidth has been assigned. ÂThis is
>> tracked by whether the local cfs_rq->quota_assigned is finite or infinite
>> (RUNTIME_INF).
>>
>> For cfs_rq's that belong to a bandwidth constrained task_group we introduce
>> tg_request_cfs_quota() which attempts to allocate quota from the global pool
>> for use locally. ÂUpdates involving the global pool are currently protected
>> under cfs_bandwidth->lock, local pools are protected by rq->lock.
>>
>> This patch only attempts to assign and track quota, no action is taken in the
>> case that cfs_rq->quota_used exceeds cfs_rq->quota_assigned.
>>
>> Signed-off-by: Paul Turner <pjt@xxxxxxxxxx>
>> Signed-off-by: Nikhil Rao <ncrao@xxxxxxxxxx>
>> Signed-off-by: Bharata B Rao <bharata@xxxxxxxxxxxxxxxxxx>
>> ---
>> Âinclude/linux/sched.h | Â Â4 ++++
>> Âkernel/sched.c Â Â Â Â| Â 13 +++++++++++++
>> Âkernel/sched_fair.c Â | Â 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
>> Âkernel/sysctl.c Â Â Â | Â 10 ++++++++++
>> Â4 files changed, 77 insertions(+)
>>
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -1898,6 +1898,10 @@ int sched_rt_handler(struct ctl_table *t
>> Â Â Â Â Â Â Â void __user *buffer, size_t *lenp,
>> Â Â Â Â Â Â Â loff_t *ppos);
>>
>> +#ifdef CONFIG_CFS_BANDWIDTH
>> +extern unsigned int sysctl_sched_cfs_bandwidth_slice;
>> +#endif
>> +
>> Âextern unsigned int sysctl_sched_compat_yield;
>>
>> Â#ifdef CONFIG_RT_MUTEXES
>> --- a/kernel/sched.c
>> +++ b/kernel/sched.c
>> @@ -1929,6 +1929,19 @@ static const struct sched_class rt_sched
>> Â * default: 0.5s
>> Â */
>> Âstatic u64 sched_cfs_bandwidth_period = 500000000ULL;
>> +
>> +/*
>> + * default slice of quota to allocate from global tg to local cfs_rq pool on
>> + * each refresh
>> + * default: 10ms
>> + */
>> +unsigned int sysctl_sched_cfs_bandwidth_slice = 10000UL;
>> +
>> +static inline u64 sched_cfs_bandwidth_slice(void)
>> +{
>> + Â Â return (u64)sysctl_sched_cfs_bandwidth_slice * NSEC_PER_USEC;
>> +}
>> +
>> Â#endif
>>
>> Â#define sched_class_highest (&rt_sched_class)
>> --- a/kernel/sched_fair.c
>> +++ b/kernel/sched_fair.c
>> @@ -267,6 +267,16 @@ find_matching_se(struct sched_entity **s
>>
>> Â#endif Â Â Â /* CONFIG_FAIR_GROUP_SCHED */
>>
>> +#ifdef CONFIG_CFS_BANDWIDTH
>> +static inline struct cfs_bandwidth *tg_cfs_bandwidth(struct task_group *tg)
>> +{
>> + Â Â return &tg->cfs_bandwidth;
>> +}
>> +
>> +static void account_cfs_rq_quota(struct cfs_rq *cfs_rq,
>> + Â Â Â Â Â Â unsigned long delta_exec);
>> +#endif
>> +
>>
>> Â/**************************************************************
>> Â * Scheduling class tree data structure manipulation methods:
>> @@ -547,6 +557,9 @@ static void update_curr(struct cfs_rq *c
>> Â Â Â Â Â Â Â cpuacct_charge(curtask, delta_exec);
>> Â Â Â Â Â Â Â account_group_exec_runtime(curtask, delta_exec);
>> Â Â Â }
>> +#ifdef CONFIG_CFS_BANDWIDTH
>> + Â Â account_cfs_rq_quota(cfs_rq, delta_exec);
>> +#endif
>> Â}
>>
>> Âstatic inline void
>> @@ -1130,6 +1143,43 @@ static void yield_task_fair(struct rq *r
>> Â}
>>
>> Â#ifdef CONFIG_CFS_BANDWIDTH
>> +static u64 tg_request_cfs_quota(struct task_group *tg)
>> +{
>> + Â Â struct cfs_bandwidth *cfs_b = tg_cfs_bandwidth(tg);
>> + Â Â u64 delta = 0;
>> +
>> + Â Â if (cfs_b->runtime > 0 || cfs_b->quota == RUNTIME_INF) {
>
> Quick question for cfs_b->quota == RUNTIME_INF, won't cfs_b->runtime
> be always > 0?

Hi Balbir,

cfs_b->runtime can be 0 if the task group exhausts its quota.
cfs_b->runtime is a counter that is periodically refreshed to
cfs_b->quota, and is decremented every time a cfs_rq requests a slice.

-Thanks,
Nikhil

>
>> + Â Â Â Â Â Â raw_spin_lock(&cfs_b->lock);
>> + Â Â Â Â Â Â /*
>> + Â Â Â Â Â Â Â* it's possible a bandwidth update has changed the global
>> + Â Â Â Â Â Â Â* pool.
>> + Â Â Â Â Â Â Â*/
>> + Â Â Â Â Â Â if (cfs_b->quota == RUNTIME_INF)
>> + Â Â Â Â Â Â Â Â Â Â delta = sched_cfs_bandwidth_slice();
>> + Â Â Â Â Â Â else {
>> + Â Â Â Â Â Â Â Â Â Â delta = min(cfs_b->runtime,
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â sched_cfs_bandwidth_slice());
>> + Â Â Â Â Â Â Â Â Â Â cfs_b->runtime -= delta;
>> + Â Â Â Â Â Â }
>> + Â Â Â Â Â Â raw_spin_unlock(&cfs_b->lock);
>> + Â Â }
>> + Â Â return delta;
>> +}
>> +
>> +static void account_cfs_rq_quota(struct cfs_rq *cfs_rq,
>> + Â Â Â Â Â Â unsigned long delta_exec)
>> +{
>> + Â Â if (cfs_rq->quota_assigned == RUNTIME_INF)
>> + Â Â Â Â Â Â return;
>> +
>> + Â Â cfs_rq->quota_used += delta_exec;
>> +
>> + Â Â if (cfs_rq->quota_used < cfs_rq->quota_assigned)
>> + Â Â Â Â Â Â return;
>> +
>> + Â Â cfs_rq->quota_assigned += tg_request_cfs_quota(cfs_rq->tg);
>> +}
>> +
>> Âstatic int do_sched_cfs_period_timer(struct cfs_bandwidth *cfs_b, int overrun)
>> Â{
>> Â Â Â return 1;
>> --- a/kernel/sysctl.c
>> +++ b/kernel/sysctl.c
>> @@ -384,6 +384,16 @@ static struct ctl_table kern_table[] = {
>> Â Â Â Â Â Â Â .mode Â Â Â Â Â = 0644,
>> Â Â Â Â Â Â Â .proc_handler Â = proc_dointvec,
>> Â Â Â },
>> +#ifdef CONFIG_CFS_BANDWIDTH
>> + Â Â {
>> + Â Â Â Â Â Â .procname Â Â Â = "sched_cfs_bandwidth_slice_us",
>> + Â Â Â Â Â Â .data Â Â Â Â Â = &sysctl_sched_cfs_bandwidth_slice,
>> + Â Â Â Â Â Â .maxlen Â Â Â Â = sizeof(unsigned int),
>> + Â Â Â Â Â Â .mode Â Â Â Â Â = 0644,
>> + Â Â Â Â Â Â .proc_handler Â = proc_dointvec_minmax,
>> + Â Â Â Â Â Â .extra1 Â Â Â Â = &one,
>> + Â Â },
>> +#endif
>> Â#ifdef CONFIG_PROVE_LOCKING
>> Â Â Â {
>> Â Â Â Â Â Â Â .procname Â Â Â = "prove_locking",
>
> --
> Â Â Â ÂThree Cheers,
> Â Â Â ÂBalbir
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Mel Gorman: "Re: [UnifiedV4 00/16] The Unified slab allocator (V4)"
Previous message: James Bottomley: "[GIT PULL] SCSI bug fixes for 2.6.36-rc7"
In reply to: Balbir Singh: "Re: [PATCH v3 2/7] sched: accumulate per-cfs_rq cpu usage"
Next in thread: Balbir Singh: "Re: [PATCH v3 2/7] sched: accumulate per-cfs_rq cpu usage"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]