On 16/07/25 10:13, Shijie Huang wrote:okay.
On 2025/7/16 1:04, Valentin Schneider wrote:That would be good to include in the changelog.
On 07/07/25 16:36, Huang Shijie wrote:I found this issue in my Specjbb2015 test. It is very easy to trigger.
When detach_tasks() scans the src_cpu's task list, the task listHuh, a quick hacky test suggests this isn't /too/ hard to trigger; I get
may shrink during the scanning. For example, the task list
may have four tasks at the beginning, it may becomes to two
during the scanning in detach_tasks():
Task list at beginning : "ABCD"
Task list in scanning : "CD"
(ABCD stands for differnt tasks.)
In this scenario, the env->loop_max is still four, so
detach_tasks() may scan twice for some tasks:
the scanning order maybe : "DCDC"
about one occurrence every two default hackbench run (~200ms) on my dummy
QEMU setup.
Have you seen this happen on your workloads or did you find this while
staring at the code?
I noticed many times in the test.That's set using rq->nr_running which includes more than just the CFS
I even found that:
At the beginning: env->loop_max is 12.
When the detach_tasks() scans: the real task list shrink to 5.
tasks, and looking at the git history it looks like it's almost always been
the case.
Looks like env->loop_max really is only used for detach_tasks(), so perhaps
a "better" fix would be something like the below, so that we can't iterate
more than length(env->src_rq->cfs_tasks). That is, assuming
rq->cfs.h_nr_queued == length(rq->cfs_tasks)
---
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b9b4bbbf0af6f..32ae24aa377ca 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -11687,7 +11687,7 @@ static int sched_balance_rq(int this_cpu, struct rq *this_rq,
* still unbalanced. ld_moved simply stays zero, so it is
* correctly treated as an imbalance.
*/
- env.loop_max = min(sysctl_sched_nr_migrate, busiest->nr_running);
+ env.loop_max = min(sysctl_sched_nr_migrate, busiest->cfs.h_nr_queued);
more_balance:
rq_lock_irqsave(busiest, &rf);