Re: [PATCH v3] sched/core: Tweak wait_task_inactive() to force dequeue sched_delayed tasks

From: John Stultz
Date: Wed Apr 30 2025 - 18:04:31 EST


On Wed, Apr 30, 2025 at 5:43 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Tue, Apr 29, 2025 at 08:07:26AM -0700, John Stultz wrote:
> > It was reported that in 6.12, smpboot_create_threads() was
> > taking much longer then in 6.6.
> >
> > I narrowed down the call path to:
> > smpboot_create_threads()
> > -> kthread_create_on_cpu()
> > -> kthread_bind()
> > -> __kthread_bind_mask()
> > ->wait_task_inactive()
> >
> > Where in wait_task_inactive() we were regularly hitting the
> > queued case, which sets a 1 tick timeout, which when called
> > multiple times in a row, accumulates quickly into a long
> > delay.
> >
> > I noticed disabling the DELAY_DEQUEUE sched feature recovered
> > the performance, and it seems the newly create tasks are usually
> > sched_delayed and left on the runqueue.
> >
> > So in wait_task_inactive() when we see the task
> > p->se.sched_delayed, manually dequeue the sched_delayed task
> > with DEQUEUE_DELAYED, so we don't have to constantly wait a
> > tick.
>
> ---
>
> (that is, I'll trim the Changelog a this point, seeing how the rest is
> 'discussion')
>

Ah, thanks. I've noted you tweaking my commit messages before merging,
so I'll try to do better about leaving ephemeral notes (and Cc lists,
apparently) after the "---" fold.
My apologies for the trouble!


> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index c81cf642dba05..b986cd2fb19b7 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -2283,6 +2283,12 @@ unsigned long wait_task_inactive(struct task_struct *p, unsigned int match_state
> > * just go back and repeat.
> > */
> > rq = task_rq_lock(p, &rf);
> > + /*
> > + * If task is sched_delayed, force dequeue it, to avoid always
> > + * hitting the tick timeout in the queued case
> > + */
> > + if (p->se.sched_delayed)
> > + dequeue_task(rq, p, DEQUEUE_SLEEP | DEQUEUE_DELAYED);
> > trace_sched_wait_task(p);
> > running = task_on_cpu(rq, p);
> > queued = task_on_rq_queued(p);
>
> Lets just do this. I'll to stick it in queue/sched/core.

Ok, thanks so much!
-john