[PATCH mm-unstable v1] mm: multi-gen LRU: fix crash during cgroup migration

From: Yu Zhao
Date: Sun Jan 15 2023 - 22:46:44 EST


lru_gen_migrate_mm() assumes lru_gen_add_mm() runs prior to itself.
This isn't true for the following scenario:

CPU 1 CPU 2

clone()
cgroup_can_fork()
cgroup_procs_write()
cgroup_post_fork()
task_lock()
lru_gen_migrate_mm()
task_unlock()
task_lock()
lru_gen_add_mm()
task_unlock()

And when the above happens, kernel crashes because of linked list
corruption (mm_struct->lru_gen.list).

Link: https://lore.kernel.org/r/20230115134651.30028-1-msizanoen@xxxxxxxxxxx/
Reported-by: msizanoen <msizanoen@xxxxxxxxxxx>
Tested-by: msizanoen <msizanoen@xxxxxxxxxxx>
Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks")
Cc: stable@xxxxxxxxxxxxxxx # v6.1+
Signed-off-by: Yu Zhao <yuzhao@xxxxxxxxxx>
---
mm/vmscan.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index cdf96aec39dc..394ff4962cbc 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3335,13 +3335,16 @@ void lru_gen_migrate_mm(struct mm_struct *mm)
if (mem_cgroup_disabled())
return;

+ /* migration can happen before addition */
+ if (!mm->lru_gen.memcg)
+ return;
+
rcu_read_lock();
memcg = mem_cgroup_from_task(task);
rcu_read_unlock();
if (memcg == mm->lru_gen.memcg)
return;

- VM_WARN_ON_ONCE(!mm->lru_gen.memcg);
VM_WARN_ON_ONCE(list_empty(&mm->lru_gen.list));

lru_gen_del_mm(mm);
--
2.39.0.314.g84b9a713c41-goog