Re: [RFC][PATCH v20 0/6] Donor Migration for Proxy Execution (v20)

From: John Stultz
Date: Wed Jul 23 2025 - 18:43:01 EST


On Wed, Jul 23, 2025 at 7:44 AM Juri Lelli <juri.lelli@xxxxxxxxxx> wrote:
> On 22/07/25 07:05, John Stultz wrote:
> > Issues still to address with the full series:
> > * There’s a new quirk from recent changes for dl_server that
> > is causing the ksched_football test in the full series to hang
> > at boot. I’ve bisected and reverted the change for now, but I
> > need to better understand what’s going wrong.
>
> After our quick chat on IRC, I remembered that there were additional two
> fixes for dl-server posted, but still not on tip.
>
> https://lore.kernel.org/lkml/20250615131129.954975-1-kuyo.chang@xxxxxxxxxxxx/
> https://lore.kernel.org/lkml/20250627035420.37712-1-yangyicong@xxxxxxxxxx/
>
> So I went ahead and pushed them to
>
> git@xxxxxxxxxx:jlelli/linux.git upstream/fix-dlserver
>
> Could you please check if any (or both together) of the two topmost
> changes do any good to the issue you are seeing?

Thanks for sharing these! Unfortunately they don't seem to help. :/

I'm still digging down into the behavior. I'm not 100% sure the
problem isn't just my test logic starving itself (after creating
NR_CPU RT spinners, its not surprising creating new threads might be
tough if the non-RT kthreadd can't get scheduled), but I don't quite
see how the dl_server patch cccb45d7c429 ("sched/deadline: Less
agressive dl_server handling") would be the cause of the dramatic
behavioral change - esp as this test was also functional prior to the
dl_server logic landing. Also it's odd just re-adding the
dl_server_stop() call removed from dequeue_entities() seems to make it
work again. So I clearly need to dig more to understand the behavior.

Thanks again for your suggestions! I'm going to dig further and let
folks know when I figure this detail out

thanks
-john