Re: [RFC PATCH] sched/deadline: Avoid dl_server boosting with expired deadline

From: Peter Zijlstra

Date: Tue Oct 14 2025 - 06:25:46 EST


On Tue, Oct 14, 2025 at 12:05:06PM +0200, Gabriele Monaco wrote:
> On Tue, 2025-10-14 at 11:54 +0200, Peter Zijlstra wrote:
> > On Tue, Oct 07, 2025 at 02:29:04PM +0200, Gabriele Monaco wrote:
> > > Recent changes to the deadline server leave it running when the system
> > > is idle. If the system is idle for longer than the dl_server period and
> > > the first scheduling occurs after a fair task wakes up, the algorithm
> > > picks the server as the earliest deadline (in the past) and that boosts
> > > the fair task that just woke up while:
> > >  * the deadline is in the past
> > >  * the server consumed all its runtime (in background)
> > >  * there is no starvation (idle for about a period)
> > >
> > > Prevent the server from boosting a task when the deadline is in the
> > > past. Instead, replenish a new period and start the server as deferred.
> >
> > I'm a bit confused, should not enqueue ensure deadline is in the future?
> > And if it doesn't shouldn't we fix the enqueue path somewhere?
>
> Enqueue of a deadline task should handle the case, here the CPU is idle and the
> deadline server did not stop yet (and won't until the next schedule, if I'm not
> mistaken).
> The following enqueue of a fair task triggers a schedule where the server (no
> longer deferred) boosts the task straight away.
>
> Now the only check for deadline is in pick_next_dl_entity, where the earliest
> one is chosen, despite being in the past.
>
> Do you mean to check for deadline when enqueueing the fair task too? I believe
> again nothing happens here because the server is still up.
>
> Does it make sense or am I missing something?

Lets be confused together :-)

So dl_server is active, but machine is otherwise idle, this means
dl_server_timer is pending, right?

This timer is in one of two states:

- waiting for replenish; which will trigger and switch to 0-laxity.
- waiting for 0-laxity

So that 0-laxity case is the interesting one; when the machine really is
idle, no fair tasks will run and its runtime budget will not get
depleted. Therefore, once we hit 0-laxity, it will do
enqueue_dl_entity(dl_se, ENQUEUE_REPLENISH).

This enqueue should ensure dl_se->deadline is in the future, right?

Anyway, we run this deadline entity (there ain't nothing else to do
anyway), and it finds there aren't any fair tasks, it does
dl_server_stop().


Then, if a fair takes wakes (nr_running: 0->1) and dl_server isn't
active, we do dl_server_start(), which in turn does enqueue_dl_entity().
Now this enqueue is supposed to check if the dl_entity can still run;
does it still have time left in its current period, if not, its
replenish timer time.


So where exactly does the fair task start, and result in dl_se being
on_rq such that dl_deadline is in the past? That sounds like an enqueue
problem to me.