[PATCH v2 0/6] sched/fair: Fix imbalance issue when balancing fork

From: Adam Li
Date: Thu Jul 17 2025 - 02:21:34 EST


Load imbalance is observed when the workload frequently forks new threads.
Due to CPU affinity, the workload can run on CPU 0-7 in the first
group, and only on CPU 8-11 in the second group. CPU 12-15 are always idle.

{ 0 1 2 3 4 5 6 7 } {8 9 10 11 12 13 14 15}
* * * * * * * * * * * *

When looking for dst group for newly forked threads, in many times
update_sg_wakeup_stats() reports the second group has more idle CPUs
than the first group. The scheduler thinks the second group is less
busy. Then it selects least busy CPUs among CPU 8-11. So CPU 8-11 can be
crowded with newly forked threads, at the same time CPU 0-7 can be idle.

The first patch 'Only update stats of allowed CPUs when looking for dst
group' *alone* can fix this imbalance issue. With this patch, performance
significantly improved for workload with frequent task fork, if the
workload is set to use part of CPUs in a schedule group.

And I think the second patch also makes sense in this scenario. If group
weight includes CPUs a task cannot use, group classification can be
incorrect.

Peter mentioned [1] that the second patch might also apply to
update_sg_lb_stats(). The third patch counts group weight from 'env->cpus'
(active CPUs). Group classification can be incorrect if group weight
includes inactive CPUs.

Peter also mentioned that update_sg_wakeup_stats() and update_sg_lb_stats()
are very similar, that they might be unified. The RFC patches 4-6 try to
refactor the two functions. The common logic is unified to a new function
update_sg_stats().

I tested with Specjbb workload on arm64 server. The patch set does not
introduce observable performance change. But the test cannot cover every
code path. Please review.

v2:
Follow Peter's suggestions:
1) Apply the second patch to update_sg_lb_stats().
2) Refactor and unify update_sg_wakeup_stats() and update_sg_lb_stats().

v1:
https://lore.kernel.org/lkml/20250701024549.40166-1-adamli@xxxxxxxxxxxxxxxxxxxxxx/

links:
[1]: https://lore.kernel.org/lkml/20250704091758.GG2001818@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/

Adam Li (6):
sched/fair: Only update stats for allowed CPUs when looking for dst
group
sched/fair: Only count group weight for allowed CPUs when looking for
dst group
sched/fair: Only count group weight for CPUs doing load balance when
looking for src group
sched/fair: Make update_sg_wakeup_stats() helper functions handle NULL
pointers
sched/fair: Introduce update_sg_stats()
sched/fair: Unify update_sg_lb_stats() and update_sg_wakeup_stats()

kernel/sched/fair.c | 274 ++++++++++++++++++++++++--------------------
1 file changed, 148 insertions(+), 126 deletions(-)

--
2.34.1