Re: [RFC PATCH 1/1] sched/deadline: Fix RT task potential starvation when expiry time passed
From: Kuyo Chang
Date: Fri Jun 20 2025 - 22:55:41 EST
On Fri, 2025-06-20 at 17:22 +0200, Juri Lelli wrote:
>
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>
>
> On 20/06/25 11:00, Kuyo Chang wrote:
>
> ...
>
> >
>
> Thanks for the additional explanation.
>
> The way I understand it now is the following (of course please
> correct
> me if I am still not getting it :)
>
> - a dl_server is actively servicing NORMAL tasks, but suffers lot of
> IRQ
> load and cannot make much progress
> - it does anyway make progress, but it reaches
> update_curr_dl_se@throttle
> only when its current deadline is past rq_clock
> - dl_runtime_exceeded() branch is entered, but start_dl_timer() fails
> as
> the computed act is still in the past
> - enqueue_dl_entity(REPLENISH) call replenish_dl_entity() which tries
> to
> add runtime and advance the deadline, but time moved on so far that
> deadline is still behind rq_clock() and so "DL replenish ..." is
> printed
> - replenish_dl_new_period() updates runtime and deadline from current
> clock and the dl-server is put back to run (so it continues to run
> over/starve FIFO tasks)
>
Yes, "DL replenish ..." is the critical clue for identifying the root
cause of this issue.
> It looks like your proposed fix might work in this particular corner
> case, but I am not 100% comfortable with not trying to replenish
> properly (catch up with runtime) at all. I wonder if we might then
> start
> missing some other corner case. Maybe we could try to catch this
> particular corner case before even attempting to start the dl_timer,
> since we know it will fail, and do something at that point?
>
You can consider the patch more as an error-proofing mechanism, and so
far, it has been working well on our platform.
However, it might be better to catch this particular corner case in
advance to prevent the issue.
> Thanks,
> Juri
>