Re: [PATCH v4] sched/fair: do not scan twice in detach_tasks()

From: Shijie Huang
Date: Mon Jul 21 2025 - 23:26:44 EST



On 2025/7/21 19:25, Valentin Schneider wrote:
On 21/07/25 11:40, Vincent Guittot wrote:
On Mon, 21 Jul 2025 at 04:40, Huang Shijie
<shijie@xxxxxxxxxxxxxxxxxxxxxx> wrote:
detach_tasks() uses struct lb_env.loop_max as an env.src_rq->cfs_tasks
iteration count limit. It is however set without the source RQ lock held,
and besides detach_tasks() can be re-invoked after releasing and
re-acquiring the RQ lock per LBF_NEED_BREAK.

This means that env.loop_max and the actual length of env.src_rq->cfs_tasks
as observed within detach_tasks() can differ. This can cause some tasks to
why not setting env.loop_max only once rq lock is taken in this case ?

side note : by default loop_max <= loop_break

I thought so too and dismissed that due to LBF_NEED_BREAK, but I guess we
could still do something like:

---
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b9b4bbbf0af6f..eef3a0d341661 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -11643,6 +11643,7 @@ static int sched_balance_rq(int this_cpu, struct rq *this_rq,
.dst_grpmask = group_balance_mask(sd->groups),
.idle = idle,
.loop_break = SCHED_NR_MIGRATE_BREAK,
+ .loop_max = UINT_MAX,
.cpus = cpus,
.fbq_type = all,
.tasks = LIST_HEAD_INIT(env.tasks),
@@ -11681,18 +11682,19 @@ static int sched_balance_rq(int this_cpu, struct rq *this_rq,
/* Clear this flag as soon as we find a pullable task */
env.flags |= LBF_ALL_PINNED;
if (busiest->nr_running > 1) {
+more_balance:
/*
* Attempt to move tasks. If sched_balance_find_src_group has found
* an imbalance but busiest->nr_running <= 1, the group is
* still unbalanced. ld_moved simply stays zero, so it is
* correctly treated as an imbalance.
*/
- env.loop_max = min(sysctl_sched_nr_migrate, busiest->nr_running);
-
-more_balance:
rq_lock_irqsave(busiest, &rf);
update_rq_clock(busiest);
+
+ env.loop_max = min3(env.loop_max, sysctl_sched_nr_migrate, busiest->h_nr_running);

It should be busiest->nr_running? or businest->cfs.h_nr_queued?


do you mind I create a patch based on this one? or You create an official patch?


Thanks

Huang Shijie