Re: [BUG] Kernel panic in __migrate_swap_task() on 6.16-rc2 (NULL pointer dereference)

From: Chen, Yu C
Date: Fri Jun 27 2025 - 03:16:51 EST

Next message: Kevin Paul Reddy Janagari: "[PATCH] workaround for Sphinx false positive preventing index"
Previous message: Joerg Roedel: "Re: [PATCH] The sun50i_iommu_of_xlate() function didn't properly handle the case where of_find_device_by_node() returns NULL. This could lead to a NULL pointer dereference when accessing platform_get_drvdata(iommu_pdev) if the device node couldn't be found."
In reply to: Jirka Hladky: "Re: [BUG] Kernel panic in __migrate_swap_task() on 6.16-rc2 (NULL pointer dereference)"
Next in thread: Chen, Yu C: "Re: [BUG] Kernel panic in __migrate_swap_task() on 6.16-rc2 (NULL pointer dereference)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Jirka,

On 6/27/2025 5:46 AM, Jirka Hladky wrote:

Hi Chen and all,

we have now verified that the following commit causes a kernel panic
discussed in this thread:

ad6b26b6a0a79 sched/numa: add statistics of numa balance task

Reverting this commit fixes the issue.

I'm happy to help debug this further or test a proposed fix.

Thanks very much for your report, it seems that there is a
race condition that when the swap task candidate was chosen,
but its mm_struct get released due to task exit, then later
when doing the task swaping, the p->mm is NULL which caused
the problem:

CPU0 CPU1
:
...
task_numa_migrate
task_numa_find_cpu
task_numa_compare
# a normal task p is chosen
env->best_task = p

# p exit:
exit_signals(p);
p->flags |= PF_EXITING
exit_mm
p->mm = NULL;

migrate_swap_stop
__migrate_swap_task((arg->src_task, arg->dst_cpu)
count_memcg_event_mm(p->mm, NUMA_TASK_SWAP)# p->mm is NULL

Could you please help check if the following debug patch works,
and if there is no issue found after you ran several tests,
could you please provide the
/sys/kernel/debug/tracing/trace

BTW, is it possible to share your test script for stress-ng,
stream? It looks like the stress-ng's fork test case would
trigger this issue easier in theory.

thanks,
Chenyu

Thank you!
Jirka

On Wed, Jun 18, 2025 at 1:34 PM Jirka Hladky <jhladky@xxxxxxxxxx> wrote:

Hi Abhigyan,

The testing is done on bare metal. The kernel panics occur after
several hours of benchmarking.

Out of 20 servers, the problem has occurred on 6 of them:
intel-sapphire-rapids-gold-6448y-2s
intel-emerald-rapids-platinum-8558-2s
amd-epyc5-turin-9655p-1s
amd-epyc4-zen4c-bergamo-9754-1s
amd-epyc3-milan-7713-2s
intel-skylake-2s

The number in the name is the CPU model. 1s: single socket, 2s: dual socket.

We were not able to find a clear pattern. It appears to be a race
condition of some kind.

We run various performance benchmarks, including Linpack, Stream, NAS
(https://www.nas.nasa.gov/software/npb.html), and Stress-ng. Testing
is conducted with various thread counts and settings. All benchmarks
together are running ~24 hours. One benchmark takes ~4 hours. Please
also note that we repeat the benchmarks to collect performance
statistics. In many cases, kernel panic has occurred when the
benchmark was repeated.

Crash occurred while running these tests:
Stress_ng: Starting test 'fork' (#29 out of 41), number of threads 32,
iteration 1 out of 5
SPECjbb2005: Starting DEFAULT run with 4 SPECJBB2005 instances, each
with 24 warehouses, iteration 2 out of 3
Stress_ng: test 'sem' (#30 out of 41), number of threads 24, iteration
2 out of 5
Stress_ng: test 'sem' (#30 out of 41), number of threads 64, iteration
4 out of 5
SPECjbb2005: SINGLE run with 1 SPECJBB2005 instances, each with 128
warehouses, iteration 2 out of 3
Linpack: Benchmark-utils/linpackd, iteration 3, testType affinityRun,
number of threads 128
NAS: NPB_sources/bin/is.D.x

There is no clear benchmark triggering the kernel panic. Looping
Stress_ng's sem test looks, however, like it's worth trying.

I hope this helps. Please let me know if there's anything I can help
with to pinpoint the problem.

Thanks
Jirka

On Wed, Jun 18, 2025 at 7:19 AM Abhigyan ghosh
<zscript.team.zs@xxxxxxxxx> wrote:

Hi Jirka,

Thanks for the detailed report.

I'm curious about the specific setup in which this panic was triggered. Could you share more about the exact configuration or parameters you used for running `stress-ng` or Linpack? For instance:

- How many threads/cores were used?
- Was it running inside a VM, container, or bare-metal?
- Was this under any thermal throttling or power-saving mode?

I'd like to try reproducing it locally to study the failure further.

Best regards,
Abhigyan Ghosh

On 18 June 2025 1:35:30 am IST, Jirka Hladky <jhladky@xxxxxxxxxx> wrote:

Hi all,

I’ve encountered a reproducible kernel panic on 6.16-rc1 and 6.16-rc2
involving a NULL pointer dereference in `__migrate_swap_task()` during
CPU migration. This occurred on various AMD and Intel systems while
running a CPU-intensive workload (Linpack, Stress_ng - it's not
specific to a benchmark).

Full trace below:
---
BUG: kernel NULL pointer dereference, address: 00000000000004c8
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 4078b99067 P4D 4078b99067 PUD 0
Oops: Oops: 0000 [#1] SMP NOPTI
CPU: 74 UID: 0 PID: 466 Comm: migration/74 Kdump: loaded Not tainted
6.16.0-0.rc2.24.eln149.x86_64 #1 PREEMPT(lazy)
Hardware name: GIGABYTE R182-Z91-00/MZ92-FS0-00, BIOS M07 09/03/2021
Stopper: multi_cpu_stop+0x0/0x130 <- migrate_swap+0xa7/0x120
RIP: 0010:__migrate_swap_task+0x2f/0x170
Code: 41 55 4c 63 ee 41 54 55 53 48 89 fb 48 83 87 a0 04 00 00 01 65
48 ff 05 e7 14 dd 02 48 8b af 50 0a 00 00 66 90 e8 61 93 07 00 <48> 8b
bd c8 04 00 00 e8 85 11 35 00 48 85 c0 74 12 ba 01 00 00 00
RSP: 0018:ffffce79cd90bdd0 EFLAGS: 00010002
RAX: 0000000000000001 RBX: ffff8e9c7290d1c0 RCX: 0000000000000000
RDX: ffff8e9c71e83680 RSI: 000000000000001b RDI: ffff8e9c7290d1c0
RBP: 0000000000000000 R08: 00056e36392913e7 R09: 00000000002ab980
R10: ffff8eac2fcb13c0 R11: ffff8e9c77997410 R12: ffff8e7c2fcf12c0
R13: 000000000000001b R14: ffff8eac71eda944 R15: ffff8eac71eda944
FS: 0000000000000000(0000) GS:ffff8eac9db4a000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000004c8 CR3: 0000003072388003 CR4: 0000000000f70ef0
PKRU: 55555554
Call Trace:
<TASK>
migrate_swap_stop+0xe8/0x190
multi_cpu_stop+0xf3/0x130
? __pfx_multi_cpu_stop+0x10/0x10
cpu_stopper_thread+0x97/0x140
? __pfx_smpboot_thread_fn+0x10/0x10
smpboot_thread_fn+0xf3/0x220
kthread+0xfc/0x240
? __pfx_kthread+0x10/0x10
? __pfx_kthread+0x10/0x10
ret_from_fork+0xf0/0x110
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
---

**Kernel Version:**
6.16.0-0.rc2.24.eln149.x86_64 (Fedora rawhide)
https://koji.fedoraproject.org/koji/buildinfo?buildID=2732950

**Reproducibility:**
Happened multiple times during routine CPU-intensive operations. It
happens with various benchmarks (Stress_ng, Linpack) after several
hours of performance testing. `migration/*` kernel threads hit a NULL
dereference in `__migrate_swap_task`.

**System Info:**
- Platform: GIGABYTE R182-Z91-00 (dual socket EPYC)
- BIOS: M07 09/03/2021
- Config: Based on Fedora’s debug kernel (`PREEMPT(lazy)`)

**Crash Cause (tentative):**
NULL dereference at offset `0x4c8` from a task struct pointer in
`__migrate_swap_task`. Possibly an uninitialized or freed
`task_struct` field.

Please let me know if you’d like me to test a patch or if you need
more details.

Thanks,
Jirka

aghosh

--
-Jirka

Next message: Kevin Paul Reddy Janagari: "[PATCH] workaround for Sphinx false positive preventing index"
Previous message: Joerg Roedel: "Re: [PATCH] The sun50i_iommu_of_xlate() function didn't properly handle the case where of_find_device_by_node() returns NULL. This could lead to a NULL pointer dereference when accessing platform_get_drvdata(iommu_pdev) if the device node couldn't be found."
In reply to: Jirka Hladky: "Re: [BUG] Kernel panic in __migrate_swap_task() on 6.16-rc2 (NULL pointer dereference)"
Next in thread: Chen, Yu C: "Re: [BUG] Kernel panic in __migrate_swap_task() on 6.16-rc2 (NULL pointer dereference)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]