Re: [PATCH] sched/deadline: Fix stale throttling on de-/boosted tasks

From: Juri Lelli
Date: Wed Sep 02 2020 - 02:00:42 EST


Hi,

On 31/08/20 13:07, Lucas Stach wrote:
> When a boosted task gets throttled, what normally happens is that it's
> immediately enqueued again with ENQUEUE_REPLENISH, which replenishes the
> runtime and clears the dl_throttled flag. There is a special case however:
> if the throttling happened on sched-out and the task has been deboosted in
> the meantime, the replenish is skipped as the task will return to its
> normal scheduling class. This leaves the task with the dl_throttled flag
> set.
>
> Now if the task gets boosted up to the deadline scheduling class again
> while it is sleeping, it's still in the throttled state. The normal wakeup
> however will enqueue the task with ENQUEUE_REPLENISH not set, so we don't
> actually place it on the rq. Thus we end up with a task that is runnable,
> but not actually on the rq and neither a immediate replenishment happens,
> nor is the replenishment timer set up, so the task is stuck in
> forever-throttled limbo.
>
> Clear the dl_throttled flag before dropping back to the normal scheduling
> class to fix this issue.
>
> Signed-off-by: Lucas Stach <l.stach@xxxxxxxxxxxxxx>
> ---
> This is the root cause and fix of the issue described at [1]. After working
> on other stuff for the last few months, I finally was able to circle back
> to this issue and gather the required data to pinpoint the failure mode.
>
> [1] https://lkml.org/lkml/2020/3/20/765
> ---
> kernel/sched/deadline.c | 13 ++++++++-----
> 1 file changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 3862a28cd05d..c19c1883d695 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1527,12 +1527,15 @@ static void enqueue_task_dl(struct rq *rq, struct task_struct *p, int flags)
> pi_se = &pi_task->dl;
> } else if (!dl_prio(p->normal_prio)) {
> /*
> - * Special case in which we have a !SCHED_DEADLINE task
> - * that is going to be deboosted, but exceeds its
> - * runtime while doing so. No point in replenishing
> - * it, as it's going to return back to its original
> - * scheduling class after this.
> + * Special case in which we have a !SCHED_DEADLINE task that is going
> + * to be deboosted, but exceeds its runtime while doing so. No point in
> + * replenishing it, as it's going to return back to its original
> + * scheduling class after this. If it has been throttled, we need to
> + * clear the flag, otherwise the task may wake up as throttled after
> + * being boosted again with no means to replenish the runtime and clear
> + * the throttle.
> */
> + p->dl.dl_throttled = 0;
> BUG_ON(!p->dl.dl_boosted || flags != ENQUEUE_REPLENISH);
> return;
> }

Ah, right, thanks for looking into this issue!

Wonder if we should be calling __dl_clear_params() instead of just
clearing dl_throttled, but what you propose makes sense to me.

Acked-by: Juri Lelli <juri.lelli@xxxxxxxxxx>

Best,

Juri