Re: [tip: sched/core] sched/numa: Avoid migrating task to CPU-less node

From: Huang, Ying
Date: Tue Mar 01 2022 - 20:00:03 EST


Qian Cai <quic_qiancai@xxxxxxxxxxx> writes:

> On Thu, Feb 17, 2022 at 06:56:52PM -0000, tip-bot2 for Huang Ying wrote:
>> The following commit has been merged into the sched/core branch of tip:
>>
>> Commit-ID: 5c7b1aaf139dab5072311853bacc40fc3457d1f9
>> Gitweb: https://git.kernel.org/tip/5c7b1aaf139dab5072311853bacc40fc3457d1f9
>> Author: Huang Ying <ying.huang@xxxxxxxxx>
>> AuthorDate: Mon, 14 Feb 2022 20:15:53 +08:00
>> Committer: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
>> CommitterDate: Wed, 16 Feb 2022 15:57:53 +01:00
>>
>> sched/numa: Avoid migrating task to CPU-less node
>>
>> In a typical memory tiering system, there's no CPU in slow (PMEM) NUMA
>> nodes. But if the number of the hint page faults on a PMEM node is
>> the max for a task, The current NUMA balancing policy may try to place
>> the task on the PMEM node instead of DRAM node. This is unreasonable,
>> because there's no CPU in PMEM NUMA nodes. To fix this, CPU-less
>> nodes are ignored when searching the migration target node for a task
>> in this patch.
>>
>> To test the patch, we run a workload that accesses more memory in PMEM
>> node than memory in DRAM node. Without the patch, the PMEM node will
>> be chosen as preferred node in task_numa_placement(). While the DRAM
>> node will be chosen instead with the patch.
>>
>> Signed-off-by: "Huang, Ying" <ying.huang@xxxxxxxxx>
>> Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
>> Link: https://lkml.kernel.org/r/20220214121553.582248-2-ying.huang@xxxxxxxxx
>
> Reverting this commit on the top of today's linux-next fixed a boot crash
> on arm64 NUMA systems.
>
> Unable to handle kernel paging request at virtual address ffff7a6601694aec
> KASAN: maybe wild-memory-access in range [0xffffd3300b4a5760-0xffffd3300b4a5767]
> Mem abort info:
> ESR = 0x96000005
> EC = 0x25: DABT (current EL), IL = 32 bits
> mlx5_core 0007:02:00.0: enabling device (0100 -> 0102)
> SET = 0, FnV = 0
> EA = 0, S1PTW = 0
> FSC = 0x05: level 1 translation fault
> Data abort info:
> ISV = 0, ISS = 0x00000005
> CM = 0, WnR = 0
> swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000400b3d6c6000
> [ffff7a6601694aec] pgd=0000403fc007f003, p4d=0000403fc007f003, pud=0000000000000000
> Internal error: Oops: 96000005 [#1] PREEMPT SMP
> Modules linked in: nouveau(+) drm_ttm_helper ttm nvme(+) drm_dp_helper drm_kms_helper mlx5_core(+) mpt3sas(+) xhci_pci(+) nvme_core raid_class xhci_pci_renesas drm
> CPU: 85 PID: 1308 Comm: udevadm Not tainted 5.17.0-rc6-next-20220301 #1
> pstate: 40400009 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : task_numa_placement
> lr : task_numa_placement
> sp : ffff800031047760
> x29: ffff800031047760 x28: ffff3fffab916c00 x27: 0000000000000020
> x26: 0000000000000001 x25: 0000000000000000 x24: 0000000000000000
>
> x23: ffff07ffe5289a80 x22: ffffd3300b4a5760 x21: 000000000000003f
> x20: ffffd32feb4a5768 x19: 0000000000000000 x18: ffff07ffe528ad88
> x17: ffffd32fe5693a1c x16: 0000000000000000 x15: ffff8000310478e0
>
> x14: ffff07ffe528ad90 x13: 0000000000000002 x12: dfff80000000000d
> x11: 0000000000000001 x10: 000000000000b6be x9 : 0000000000000000
> x8 : 00000000ffffffff x7 : ffffd32feb4a5780 x6 : 0000000000000000
> x5 : 0000000000000000 x4 : 0000000000000000 x3 : 1ffffa6601694aec
> x2 : 0000000000000000 x1 : dfff800000000000 x0 : 000000001ffffff8
> Call trace:
> task_numa_placement
> arch_test_bit at include/asm-generic/bitops/non-atomic.h:118
> (inlined by) node_state at include/linux/nodemask.h:416
> (inlined by) task_numa_placement at kernel/sched/fair.c:2439
> task_numa_fault
> do_numa_page
> handle_pte_fault
> __handle_mm_fault
> handle_mm_fault
> do_page_fault
> do_translation_fault
> do_mem_abort
> el0_da
> el0t_64_sync_handler
> el0t_64_sync
> Code: 8b000296 d2d00001 f2fbffe1 d343fec3 (38e16861)
> ---[ end trace 0000000000000000 ]---
> Kernel panic - not syncing: Oops: Fatal exception
> SMP: stopping secondary CPUs
> Kernel Offset: 0x532fdcf70000 from 0xffff800008000000
> PHYS_OFFSET: 0x80000000
> CPU features: 0x00,00042c0c,19801c82
> Memory Limit: none
> ---[ end Kernel panic - not syncing: Oops: Fatal exception ]---

Thanks for reporting! Can you try whether the following debug patch can fix the issue?

Best Regards,
Huang, Ying

----------------------------8<-------------------------------------------