Re: [PATCH] sched/numa: Fix NULL pointer access to mm_struct durng task swap

From: Chen, Yu C
Date: Thu Jul 03 2025 - 05:37:52 EST


Hi Michal,

On 7/3/2025 3:18 PM, Michal Hocko wrote:
On Thu 03-07-25 00:32:47, Chen Yu wrote:
It was reported that after Commit ad6b26b6a0a7
("sched/numa: add statistics of numa balance task"),
a NULL pointer exception[1] occurs when accessing
p->mm. The following race condition was found to
trigger this bug: After a swap task candidate is
chosen during NUMA balancing, its mm_struct is
released due to task exit. Later, when the task
swapping is performed, p->mm is NULL, which causes
the problem:

CPU0 CPU1
:
...
task_numa_migrate
task_numa_find_cpu
task_numa_compare
# a normal task p is chosen
env->best_task = p

# p exit:
exit_signals(p);
p->flags |= PF_EXITING
exit_mm
p->mm = NULL;

migrate_swap_stop
__migrate_swap_task((arg->src_task, arg->dst_cpu)
count_memcg_event_mm(p->mm, NUMA_TASK_SWAP)# p->mm is NULL

Fix this issue by checking if the task has the PF_EXITING
flag set in migrate_swap_stop(). If it does, skip updating
the memcg events. Additionally, log a warning if p->mm is
NULL to facilitate future debugging.

Fixes: ad6b26b6a0a7 ("sched/numa: add statistics of numa balance task")
Reported-by: Jirka Hladky <jhladky@xxxxxxxxxx>
Closes: https://lore.kernel.org/all/CAE4VaGBLJxpd=NeRJXpSCuw=REhC5LWJpC29kDy-Zh2ZDyzQZA@xxxxxxxxxxxxxx/
Reported-by: Srikanth Aithal <Srikanth.Aithal@xxxxxxx>
Reported-by: Suneeth D <Suneeth.D@xxxxxxx>
Suggested-by: Libo Chen <libo.chen@xxxxxxxxxx>
Signed-off-by: Chen Yu <yu.c.chen@xxxxxxxxx>
---
kernel/sched/core.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8988d38d46a3..4e06bb955dad 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3364,7 +3364,14 @@ static void __migrate_swap_task(struct task_struct *p, int cpu)
{
__schedstat_inc(p->stats.numa_task_swapped);
count_vm_numa_event(NUMA_TASK_SWAP);
- count_memcg_event_mm(p->mm, NUMA_TASK_SWAP);
+ /* exiting task has NULL mm */
+ if (!(p->flags & PF_EXITING)) {
+ WARN_ONCE(!p->mm, "swap task %d %s %x has no mm\n",
+ p->pid, p->comm, p->flags);

As Andrew already said this is not really acceptable because this is
very likely too easy to trigger and a) you do not want logs flooded with
warnings and also there are setups with panic_on_warn configured and for
those this would be a fatal situation without any good reason.


OK, got it, thanks for pointing it out.

+
+ if (p->mm)
+ count_memcg_event_mm(p->mm, NUMA_TASK_SWAP);
+ }

Why are you testing for p->mm here? Isn't PF_EXITING test sufficient?
A robust way to guarantee non-NULL mm against races when a task is
exiting is find_lock_task_mm. Probably too heavy weight for this path.

I suppose we might only need to grab task_lock(p), check if its mm
pointer is NULL. If yes, we skip the update of memcg event without
scanning for a non-NULL mm within the process(as find_lock_task_mm()
does)? If the mm is non-NULL, we update the memcg event with task_lock(p)
hold and releases it later.

thanks,
Chenyu

if (task_on_rq_queued(p)) {
struct rq *src_rq, *dst_rq;
--
2.25.1