Re: [patch 15/16] sched: return unused runtime on voluntary sleep

From: Paul Turner
Date: Mon Jun 27 2011 - 21:45:39 EST


On Thu, Jun 23, 2011 at 8:26 AM, Peter Zijlstra <a.p.zijlstra@xxxxxxxxx> wrote:
> On Tue, 2011-06-21 at 00:17 -0700, Paul Turner wrote:
>> plain text document attachment (sched-bwc-simple_return_quota.patch)
>> When a local cfs_rq blocks we return the majority of its remaining quota to the
>> global bandwidth pool for use by other runqueues.
>
> OK, I saw return_cfs_rq_runtime() do that.
>
>> We do this only when the quota is current and there is more than
>> min_cfs_rq_quota [1ms by default] of runtime remaining on the rq.
>
> sure..
>
>> In the case where there are throttled runqueues and we have sufficient
>> bandwidth to meter out a slice, a second timer is kicked off to handle this
>> delivery, unthrottling where appropriate.
>
> I'm having trouble there, what's the purpose of the timer, you could
> redistribute immediately. None of this is well explained.
>

Current reasons:
- There was concern regarding thrashing the unthrottle path on a task
that is rapidly oscillating between runnable states, using a timer
this operation is inherently limited both in frequency and to a single
cpu. I think the move to using a throttled list (as opposed to having
to poll all cpus) as well as the fact that we only return quota in
excess of min_cfs_rq_quota probably mitigates this to the point where
we could just do away with this and do it directly in the put path.

- The aesthetics of releasing rq->lock in the put path. Quick
inspection suggests it should actually be safe to do at that point,
and we do similar for idle_balance().

Given consideration the above two factors are not requirements, this
could be moved out of a timer and into the put_path directly (with the
fact that we drop rq->lock strongly commented). I have no strong
preference between either choice.

Uninteresting additional historical reason:
The /original/ requirement for a timer here is that previous versions
placed some of the bandwidth distribution under cfs_b->lock. This
meant that we couldn't take rq->lock under cfs_b->lock (as the nesting
is the other way around). This is no longer a requirement
(advancement of expiration now provides what cfs_b->lock used to
here).




A timer is used so that we don't have to release rq->lock within the put path
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/