Re: [PATCH RFC] sched/rt: preserve global runtime/period ratio indo_balance_runtime()

From: Michael Wang
Date: Wed May 22 2013 - 03:58:53 EST


Hi, Peter

On 05/22/2013 05:30 AM, Peter Boonstoppel wrote:
> RT throttling aims to prevent starvation of non-SCHED_FIFO threads
> when a rogue RT thread is hogging the CPU. It does so by piggybacking
> on the rt_bandwidth system and allocating at most rt_runtime per
> rt_period to SCHED_FIFO tasks (e.g. 950ms out of every second,
> allowing 'regular' tasks to run for at least 50ms every second).
>
> However, when multiple cores are available, rt_bandwidth allows cores
> to borrow rt_runtime from one another. This means that a core with a
> rogue RT thread, consuming 100% CPU cycles, can borrow enough runtime
> from other cores to allow the RT thread to run continuously, with no
> runtime for regular tasks on this core.

IMHO, such kind of starving should attributed to the Admin...

Reserve cpu will make realtime misnomer, then Admin will blame the
scheduler when his RT task got a higher latency...

Regards,
Michael Wang

>
> Although regular tasks can get scheduled on other available cores
> (which are guaranteed to have some non-RT runtime avaible, since they
> just lent some RT time to us), tasks that are specifically affined to
> a particular core may not be able to make progress (e.g. workqueues,
> timer functions). This can break e.g. watchdog-like functionality that
> is supposed to kill the rogue RT thread.
>
> This patch changes do_balance_runtime() in such a way that no core can
> aquire (borrow) more runtime than the globally set rt_runtime /
> rt_period ratio. This guarantees there will always be some non-RT
> runtime available on every individual core.
>
> Signed-off-by: Peter Boonstoppel <pboonstoppel@xxxxxxxxxx>
> ---
> kernel/sched/rt.c | 21 ++++++++++++++++++---
> 1 files changed, 18 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index 127a2c4..5ec4eab 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -571,11 +571,25 @@ static int do_balance_runtime(struct rt_rq *rt_rq)
> struct root_domain *rd = rq_of_rt_rq(rt_rq)->rd;
> int i, weight, more = 0;
> u64 rt_period;
> + u64 max_runtime;
>
> weight = cpumask_weight(rd->span);
>
> raw_spin_lock(&rt_b->rt_runtime_lock);
> rt_period = ktime_to_ns(rt_b->rt_period);
> +
> + /* Don't allow more runtime than global ratio */
> + if (global_rt_runtime() == RUNTIME_INF)
> + max_runtime = rt_period;
> + else
> + max_runtime = div64_u64(global_rt_runtime() * rt_period,
> + global_rt_period());
> +
> + if (rt_rq->rt_runtime >= max_runtime) {
> + raw_spin_unlock(&rt_b->rt_runtime_lock);
> + return more;
> + }
> +
> for_each_cpu(i, rd->span) {
> struct rt_rq *iter = sched_rt_period_rt_rq(rt_b, i);
> s64 diff;
> @@ -592,6 +606,7 @@ static int do_balance_runtime(struct rt_rq *rt_rq)
> if (iter->rt_runtime == RUNTIME_INF)
> goto next;
>
> +
> /*
> * From runqueues with spare time, take 1/n part of their
> * spare time, but no more than our period.
> @@ -599,12 +614,12 @@ static int do_balance_runtime(struct rt_rq *rt_rq)
> diff = iter->rt_runtime - iter->rt_time;
> if (diff > 0) {
> diff = div_u64((u64)diff, weight);
> - if (rt_rq->rt_runtime + diff > rt_period)
> - diff = rt_period - rt_rq->rt_runtime;
> + if (rt_rq->rt_runtime + diff > max_runtime)
> + diff = max_runtime - rt_rq->rt_runtime;
> iter->rt_runtime -= diff;
> rt_rq->rt_runtime += diff;
> more = 1;
> - if (rt_rq->rt_runtime == rt_period) {
> + if (rt_rq->rt_runtime == max_runtime) {
> raw_spin_unlock(&iter->rt_runtime_lock);
> break;
> }
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/