[PATCH v5 1/1] sched/deadline: Fix dl_server runtime calculation formula

From: Kuyo Chang
Date: Tue Jul 01 2025 - 22:15:06 EST

Next message: Tao Chen: "Re: [PATCH bpf-next v6 1/3] bpf: Show precise link_type for {uprobe,kprobe}_multi fdinfo"
Previous message: Damien Le Moal: "Re: [PATCH v2 2/3] nvme: prevent admin controller from smart log fetch (LID 2)"
Next in thread: John Stultz: "Re: [PATCH v5 1/1] sched/deadline: Fix dl_server runtime calculation formula"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: kuyo chang <kuyo.chang@xxxxxxxxxxxx>

In our testing with 6.12 based kernel on a big.LITTLE system, we were
seeing instances of RT tasks being blocked from running on the LITTLE
cpus for multiple seconds of time, apparently by the dl_server. This
far exceeds the default configured 50ms per second runtime.

This is due to the fair dl_server runtime calculation being scaled
for frequency & capacity of the cpu.

Consider the following case under a Big.LITTLE architecture:
Assume the runtime is: 50,000,000 ns, and Frequency/capacity
scale-invariance defined as below:
Frequency scale-invariance: 100
Capacity scale-invariance: 50
First by Frequency scale-invariance,
the runtime is scaled to 50,000,000 * 100 >> 10 = 4,882,812
Then by capacity scale-invariance,
it is further scaled to 4,882,812 * 50 >> 10 = 238,418.
So it will scaled to 238,418 ns.

This smaller "accounted runtime" value is what ends up being
subtracted against the fair-server's runtime for the current period.
Thus after 50ms of real time, we've only accounted ~238us against the
fair servers runtime. This 209:1 ratio in this example means that on
the smaller cpu the fair server is allowed to continue running,
blocking RT tasks, for over 10 seconds before it exhausts its supposed
50ms of runtime. And on other hardware configurations it can be even
worse.

For the fair deadline_server, to prevent realtime tasks from being
unexpectedly delayed, we really do want to use fixed time, and not
scaled time for smaller capacity/frequency cpus. So remove the scaling
from the fair server's accounting to fix this.

Signed-off-by: kuyo chang <kuyo.chang@xxxxxxxxxxxx>
Acked-by: Juri Lelli <juri.lelli@xxxxxxxxxx>
Suggested-by: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Suggested-by: John Stultz <jstultz@xxxxxxxxxx>
Tested-by: John Stultz <jstultz@xxxxxxxxxx>

---
v1: https://lore.kernel.org/all/20250614020524.631521-1-kuyo.chang@xxxxxxxxxxxx/
v2: https://lore.kernel.org/all/20250617155355.1479777-1-kuyo.chang@xxxxxxxxxxxx/
v3: https://lore.kernel.org/all/20250626030746.2245365-1-kuyo.chang@xxxxxxxxxxxx/
v4: https://lore.kernel.org/all/20250627022837.3331827-1-kuyo.chang@xxxxxxxxxxxx/

v1->v2
Use the dl_server flag to identify scaled or non-scaled suggested by Peter.
v2->v3
Use the dl_server(dl_se) helper function for the code refactor suggested by John.
v3->v4
Commit log cleaned up/simplified suggested by John.
v4->v5
Drop the conditional for fair_server of time_scale.
The original version of this patch(v1) is much cleaner,
suggested by John & acked by Juri.

---
kernel/sched/deadline.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index ad45a8fea245..89019a140826 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1504,7 +1504,9 @@ static void update_curr_dl_se(struct rq *rq, struct sched_dl_entity *dl_se, s64
if (dl_entity_is_special(dl_se))
return;

- scaled_delta_exec = dl_scaled_delta_exec(rq, dl_se, delta_exec);
+ scaled_delta_exec = delta_exec;
+ if (!dl_server(dl_se))
+ scaled_delta_exec = dl_scaled_delta_exec(rq, dl_se, delta_exec);

dl_se->runtime -= scaled_delta_exec;

@@ -1611,7 +1613,7 @@ static void update_curr_dl_se(struct rq *rq, struct sched_dl_entity *dl_se, s64
*/
void dl_server_update_idle_time(struct rq *rq, struct task_struct *p)
{
- s64 delta_exec, scaled_delta_exec;
+ s64 delta_exec;

if (!rq->fair_server.dl_defer)
return;
@@ -1624,9 +1626,7 @@ void dl_server_update_idle_time(struct rq *rq, struct task_struct *p)
if (delta_exec < 0)
return;

- scaled_delta_exec = dl_scaled_delta_exec(rq, &rq->fair_server, delta_exec);
-
- rq->fair_server.runtime -= scaled_delta_exec;
+ rq->fair_server.runtime -= delta_exec;

if (rq->fair_server.runtime < 0) {
rq->fair_server.dl_defer_running = 0;
--
2.45.2

Next message: Tao Chen: "Re: [PATCH bpf-next v6 1/3] bpf: Show precise link_type for {uprobe,kprobe}_multi fdinfo"
Previous message: Damien Le Moal: "Re: [PATCH v2 2/3] nvme: prevent admin controller from smart log fetch (LID 2)"
Next in thread: John Stultz: "Re: [PATCH v5 1/1] sched/deadline: Fix dl_server runtime calculation formula"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]