Re: [PATCH 13/14] selftests/sched_ext: Add test for sched_ext dl_server
From: Christian Loehle
Date: Thu Oct 23 2025 - 11:02:10 EST
On 10/20/25 15:21, Christian Loehle wrote:
> On 10/20/25 14:55, Andrea Righi wrote:
>> Hi Christian,
>>
>> On Mon, Oct 20, 2025 at 02:26:17PM +0100, Christian Loehle wrote:
>>> On 10/17/25 10:26, Andrea Righi wrote:
>>>> Add a selftest to validate the correct behavior of the deadline server
>>>> for the ext_sched_class.
>>>>
>>>> [ Joel: Replaced occurences of CFS in the test with EXT. ]
>>>>
>>>> Co-developed-by: Joel Fernandes <joelagnelf@xxxxxxxxxx>
>>>> Signed-off-by: Joel Fernandes <joelagnelf@xxxxxxxxxx>
>>>> Signed-off-by: Andrea Righi <arighi@xxxxxxxxxx>
>>>> ---
>>>> tools/testing/selftests/sched_ext/Makefile | 1 +
>>>> .../selftests/sched_ext/rt_stall.bpf.c | 23 ++
>>>> tools/testing/selftests/sched_ext/rt_stall.c | 214 ++++++++++++++++++
>>>> 3 files changed, 238 insertions(+)
>>>> create mode 100644 tools/testing/selftests/sched_ext/rt_stall.bpf.c
>>>> create mode 100644 tools/testing/selftests/sched_ext/rt_stall.c
>>>
>>>
>>> Does this pass consistently for you?
>>> For a loop of 1000 runs I'm getting total runtime numbers for the EXT task of:
>>>
>>> 0.000 - 0.261 | (7)
>>> 0.261 - 0.522 | ###### (86)
>>> 0.522 - 4.437 | (0)
>>> 4.437 - 4.698 | (1)
>>> 4.698 - 4.959 | ################### (257)
>>> 4.959 - 5.220 | ################################################## (649)
>>>
>>> I'll try to see what's going wrong here...
>>
>> Is that 1000 runs of total_bw? Yeah, the small ones don't look right at
>> all, unless they're caused by some errors in the measurement (or something
>> wrong in the test itself). Still better than without the dl_server, but
>> it'd be nice to understand what's going on. :)
>>
>> I'll try to reproduce that on my side as well.
>>
>
> Yes it's pretty much
> for i in $(seq 0 999); do ./runner -t rt_stall ; sleep 10; done
>
> I also tried to increase the runtime of the test, but results look the same so I
> assume the DL server isn't running in the fail cases.
>
FWIW the below fixes the issue and also explains why runtime of the test was irrelevant.
I wonder if we should let the test do FAIR->EXT->FAIR->EXT or something like that,
the change would be minimal and coverage improved significantly IMO.
-----8<-----
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index c5f3c39972b6..ed48c681c4c2 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -2568,6 +2568,8 @@ static void dl_server_on(struct rq *rq, bool switch_all)
err = dl_server_init_params(&rq->ext_server);
WARN_ON_ONCE(err);
+ if (rq->scx.nr_running)
+ dl_server_start(&rq->ext_server);
rq_unlock_irqrestore(rq, &rf);
}