Re: [PATCH] sched/deadline: Throttle a constrained task activated if overflow

From: Xunlei Pang
Date: Tue Apr 11 2017 - 03:05:22 EST


On 04/11/2017 at 01:53 PM, Xunlei Pang wrote:
> On 04/11/2017 at 04:47 AM, Daniel Bristot de Oliveira wrote:
>> On 04/10/2017 11:22 AM, Xunlei Pang wrote:
>>> I was testing Daniel's changes with his test case in the commit
>>> df8eac8cafce ("sched/deadline: Throttle a constrained deadline
>>> task activated after the deadline"), and tweaked it a little.
>>>
>>> Instead of having the runtime equal to the deadline, I tweaked
>>> runtime, deadline and sleep value to ensure every time it calls
>>> dl_check_constrained_dl() with "dl_se->deadline > rq_clock(rq)"
>>> as well as true dl_entity_overflow(), so it does replenishing
>>> every wake up in update_dl_entity(), and break its bandwidth.
>>>
>>> Daniel's test case had:
>>> attr.sched_runtime = 2 * 1000 * 1000; /* 2 ms */
>>> attr.sched_deadline = 2 * 1000 * 1000; /* 2 ms*/
>>> attr.sched_period = 2 * 1000 * 1000 * 1000; /* 2 s */
>>> ts.tv_sec = 0;
>>> ts.tv_nsec = 2000 * 1000; /* 2 ms */
>>>
>>> I changed it to:
>>> attr.sched_runtime = 5 * 1000 * 1000; /* 5 ms */
>>> attr.sched_deadline = 7 * 1000 * 1000; /* 7 ms */
>>> attr.sched_period = 1 * 1000 * 1000 * 1000; /* 1 s */
>>> ts.tv_sec = 0;
>>> ts.tv_nsec = 1000 * 1000; /* 1 ms */
>>>
>>> The change above can result in over 25% of the CPU on my machine.
>>>
>>> In order to avoid the beakage, we improve dl_check_constrained_dl()
>>> to prevent dl tasks from being activated until the next period if it
>>> runs out of bandwidth of the current period.
>> The problem now is that, with your patch, we will throttle the task
>> with some possible runtime. Moreover, the task did not brake any
>> rule, like being awakened after the deadline - the user-space is not
>> misbehaving.
>>
>> That is +- what the reproducer is doing when using your patch,
>> (I put some trace_printk when noticing the overflow in the wakeup).
>>
>> <idle>-0 [007] d.h. 1505.066439: enqueue_task_dl: my current runtime is 3657361 and the deadline is 4613027 from now
>> <idle>-0 [007] d.h. 1505.066439: enqueue_task_dl: my dl_runtime is 5000000
>>
>> and so the task will be throttled with 3657361 ns runtime available.
>>
>> As we can see, it is really breaking the density:
>>
>> 5ms / 7ms (.714285) < 3657361 / 4613027 (.792833)
>>
>> Well, it is not breaking that much. Trying to be less pessimist, we can
>> compute a new runtime with following equation:
>>
>> runtime = (dl_runtime / dl_deadline) * (deadline - now)
>>
>> That is, a runtime which fits in the task's density.
> This is a good point, to make the best use of remaining deadline, let me think more.

I don't know if this will make things more complicated, we can see in update_dl_entity():
if (dl_time_before(dl_se->deadline, rq_clock(rq)) ||
dl_entity_overflow(dl_se, pi_se, rq_clock(rq))) {
dl_se->deadline = rq_clock(rq) + pi_se->dl_deadline;
dl_se->runtime = pi_se->dl_runtime;
}

Looks like we have similar issue for non-constrained tasks in case of true dl_entity_overflow(), althought
its deadline is promoted(BTW, I think when overflow it should be dl_se->deadline += pi_se->dl_deadline?),
the previous runtime is discarded, we may need to apply the same runtime truncating logic on it as well
if we want to truncate runtime.

Regards,
Xunlei

>> In code:
>>
>> dl_se->runtime = (div64_u64(dl_se->dl_runtime << 20, dl_se->dl_deadline) * (dl_se->deadline - rq_clock(rq))) >> 20;
>>
>> For example (a trace_printk) showing the adjusted runtime for the
>> previous task:
>> <idle>-0 [007] d.h. 1505.066440: enqueue_task_dl: I can still run for 3294208 (it is not that bad)
>>
>> With the adjusted runtime, we have the following density:
>>
>> 3294208 / 4613027 = .714109
>>
>> as .714109 < .714285
>>
>> VoilÃ. We can still use 3.2 ms of runtime! Not bad at all.
>>
>> However, even this result is, somehow, controversial. Because we are
>> reducing the task's reservation (Q/P). But that is very close to the
>> implicit deadline behavior: when it "overflows", the runtime is truncated
>> (a replenishment...) and the deadline is postponed. In this case, we do
>> not need to postpone the deadline, just to "truncate" the runtime to fit
>> in the density... it is not that bad.
>>
>> Another possibility is not to replenish a constrained deadline
>> task twice in a period. In this case, the task would run further
>> the deadline, potentially causing problems for implicit deadline tasks.
>> But! even implicit deadline would do it in a overloaded system. The
>> problem is that, by using the density the system easily becomes
>> overloaded (too pessimistic).
>>
>> Resuming (so far):
>>
>> 1) We can be pessimistic to the constrained deadline task,
>> with Xunlei's patch;
>> 2) Compute a new runtime to fit in the density - somehow pessimistic.
>> 3) Avoid replenishing a constrained deadline before the next period.
>> but the system will become overload very easily (density).
>>
>> I think the option 2 is the mid-term.
>> I am showing a _proof of concept_ patch bellow. I is based in the
>> Xunlei's changes in the verification. I need to polish it... but...
>> you guys can read the idea in C.
>>
>> I have the 3) too, but I am not sure if it is as good as 2.
>>
>> We still need to think more about the solution.... test more... I will
>> continue working on this tomorrow, discussing with luca and tommaso
>> as well.
>>
>> Thoughts? Am I missing something (probably, I am tired :-) )?
>>
>> ---
>> kernel/sched/deadline.c | 53 ++++++++++++++++++++++++++++---------------------
>> 1 file changed, 30 insertions(+), 23 deletions(-)
>>
>> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
>> index 7508129..6fa4887 100644
>> --- a/kernel/sched/deadline.c
>> +++ b/kernel/sched/deadline.c
>> @@ -696,34 +696,38 @@ void init_dl_task_timer(struct sched_dl_entity *dl_se)
>> }
>>
>> /*
>> - * During the activation, CBS checks if it can reuse the current task's
>> - * runtime and period. If the deadline of the task is in the past, CBS
>> - * cannot use the runtime, and so it replenishes the task. This rule
>> - * works fine for implicit deadline tasks (deadline == period), and the
>> - * CBS was designed for implicit deadline tasks. However, a task with
>> - * constrained deadline (deadine < period) might be awakened after the
>> - * deadline, but before the next period. In this case, replenishing the
>> - * task would allow it to run for runtime / deadline. As in this case
>> - * deadline < period, CBS enables a task to run for more than the
>> - * runtime / period. In a very loaded system, this can cause a domino
>> - * effect, making other tasks miss their deadlines.
>> - *
>> - * To avoid this problem, in the activation of a constrained deadline
>> - * task after the deadline but before the next period, throttle the
>> - * task and set the replenishing timer to the begin of the next period,
>> - * unless it is boosted.
>> + * XXX: Daniel will document it better in a clean patch, this is
>> + * a proof of concept.
>> */
>> static inline void dl_check_constrained_dl(struct sched_dl_entity *dl_se)
>> {
>> struct task_struct *p = dl_task_of(dl_se);
>> struct rq *rq = rq_of_dl_rq(dl_rq_of_se(dl_se));
>> + u64 clock = rq_clock(rq);
>>
>> - if (dl_time_before(dl_se->deadline, rq_clock(rq)) &&
>> - dl_time_before(rq_clock(rq), dl_next_period(dl_se))) {
>> - if (unlikely(dl_se->dl_boosted || !start_dl_timer(p)))
>> - return;
>> - dl_se->dl_throttled = 1;
>> + /* we are in a new period */
>> + if (dl_time_before(dl_next_period(dl_se), rq_clock(rq)))
>> + return;
>> +
>> + /* are we ok with the deadline and density? */
>> + if (dl_time_before(rq_clock(rq), dl_se->deadline) &&
>> + !dl_entity_overflow(dl_se, dl_se, rq_clock(rq)))
>> + return;
>> +
>> + /* is the density the problem? */
>> + if (dl_entity_overflow(dl_se, dl_se, clock)) {
>> + /* reduces the runtime so it fits in the density */
>> + dl_se->runtime =
>> + (div64_u64(dl_se->dl_runtime << 20, dl_se->dl_deadline) *
>> + (dl_se->deadline - clock)) >> 20;
>> + WARN_ON(dl_se->runtime < 0);
>> + return;
>> }
>> +
>> + if (!start_dl_timer(p))
>> + return;
>> +
>> + dl_se->dl_throttled = 1;
>> }
>>
>> static
>> @@ -996,8 +1000,11 @@ static void enqueue_task_dl(struct rq *rq, struct task_struct *p, int flags)
>> * If that is the case, the task will be throttled and
>> * the replenishment timer will be set to the next period.
>> */
>> - if (!dl_se->dl_throttled && dl_is_constrained(dl_se))
>> - dl_check_constrained_dl(dl_se);
>> + if (!p->dl.dl_boosted &&
>> + !p->dl.dl_throttled &&
>> + dl_is_constrained(&p->dl))
>> + dl_check_constrained_dl(&p->dl);
>> +
>>
>> /*
>> * If p is throttled, we do nothing. In fact, if it exhausted