Re: [PATCH v3 00/10] sched/fair: Avoid unnecessary migrations within SMT domains

From: Zhang, Rui
Date: Thu Feb 09 2023 - 03:08:22 EST


Hi, All,

On Mon, 2023-02-06 at 20:58 -0800, Ricardo Neri wrote:
> Hi,
>
> This is v3 of this series. Previous versions can be found here [1]
> and
> here [2]. To avoid duplication, I do not include the cover letter of
> the
> original submission. You can read it in [1].

I happened to run into a similar issue when testing another patch
series which allows idle injections for partial cpus instead of all
cpus.
https://lore.kernel.org/all/a68a6f8c76cb719cd4865bd6aa726306772d4ee3.camel@xxxxxxxxx/

On an ADL-P NUC system with 4 Pcores (cpu0-cpu7), and 8 Ecores (cpu8-
cpu15), the problem can be reproduced by
1. start 16 stress threads
2. force idle injection to all Ecore cpus
3. stop idle injection after 10 seconds
After step 3, all the Pcore cpus are 100% busy, and all the Ecore cpus
are almost 100% idle. This situation lasts for a long time, till I kill
all the stress threads after 20 seconds.

After sync with Chen Yu, I also tried
stress -c 16 &
chrt -r 70 taskset -c 8-15 stress -c 8 -t 10
instead of idle injection, and the problem is also 100% reproducible.

And note that, the problem can be reproduced w/ and w/o ITMT enabled,
by poking /proc/sys/kernel/sched_itmt_enabled

With this whole patch series applied, I can confirm the problem is gone
both w/ and w/o ITMT enabled. So

Tested-by: Zhang Rui <rui.zhang@xxxxxxxxx>

thanks,
rui

>
> Changes since v2:
>
> Vincent correctly indicated that I was abusing asym_packing to force
> load
> balances unrelated to CPU priority. The underlying issue is that the
> scheduler cannot not handle load balances between SMT and non-SMT
> cores
> correctly. I added several prework patches to fix it... and I removed
> the
> abuse of asym_packing.
>
> Dietmar helped me to realize that there is a better way to check the
> idle
> state of SMT cores. Now I give the task to the scheduler instead of
> architecture-specific overrides. I unconditionally obey CPU
> priorities
> at the SMT level. This keeps Power7 happy. At upper levels (i.e.,
> when
> balancing load between cores) the scheduler also considers the idle
> state
> of the core in addition to CPU priority. This satisfies x86.
>
> Ionela spotted a violation of the scheduler topology sanity checks.
> We did
> not find a check that suits both Power7 and x86. For now, I removed
> the
> NEEDS_CHILD flag of SD_ASYM_PACKING.
>
> Hopefully, these patches are in sufficiently good shape to be merged.
>
> Thank you for your feedback and I look forward to getting more of it!
>
> New patches 2, 3, 4, 5, 6, 7, 8
> Updated patches: 1
> Unchanged patches: 9, 10
>
> BR,
> Ricardo
>
> [1].
> https://lore.kernel.org/lkml/20220825225529.26465-1-ricardo.neri-calderon@xxxxxxxxxxxxxxx/
> [2].
> https://lore.kernel.org/lkml/20221122203532.15013-1-ricardo.neri-calderon@xxxxxxxxxxxxxxx/
>
>
> Ricardo Neri (10):
> sched/fair: Generalize asym_packing logic for SMT cores
> sched/fair: Move is_core_idle() out of CONFIG_NUMA
> sched/fair: Only do asym_packing load balancing from fully idle SMT
> cores
> sched/fair: Let low-priority cores help high-priority busy SMT
> cores
> sched/fair: Keep a fully_busy SMT sched group as busiest
> sched/fair: Use the prefer_sibling flag of the current sched domain
> sched/fair: Do not even the number of busy CPUs via asym_packing
> sched/topology: Remove SHARED_CHILD from ASYM_PACKING
> x86/sched: Remove SD_ASYM_PACKING from the SMT domain flags
> x86/sched/itmt: Give all SMT siblings of a core the same priority
>
> arch/x86/kernel/itmt.c | 23 +----
> arch/x86/kernel/smpboot.c | 2 +-
> include/linux/sched/sd_flags.h | 5 +-
> kernel/sched/fair.c | 175 +++++++++++++++++------------
> ----
> 4 files changed, 99 insertions(+), 106 deletions(-)
>