Re: [PATCH v4 0/3] mm/oom_kill: Only delay OOM reaper for processes using robust futexes
From: zhongjinji
Date: Fri Aug 15 2025 - 13:08:42 EST
On Thu, 14 Aug 2025 21:55:52 +0800 <zhongjinji@xxxxxxxxx> wrote:
> > The OOM reaper quickly reclaims a process's memory when the system hits OOM,
> > helping the system recover. Without the OOM reaper, if a process frozen by
> > cgroup v1 is OOM killed, the victim's memory cannot be freed, leaving the
> > system in a poor state. Even if the process is not frozen by cgroup v1,
> > reclaiming victims' memory remains important, as having one more process
> > working speeds up memory release.
> >
> > When processes holding robust futexes are OOM killed but waiters on those
> > futexes remain alive, the robust futexes might be reaped before
> > futex_cleanup() runs. This can cause the waiters to block indefinitely [1].
> >
> > To prevent this issue, the OOM reaper's work is delayed by 2 seconds [1]. Since
> > many killed processes exit within 2 seconds, the OOM reaper rarely runs after
> > this delay. However, robust futex users are few, so delaying OOM reap for all
> > victims is unnecessary.
> >
> > If each thread's robust_list in a process is NULL, the process holds no robust
> > futexes. For such processes, the OOM reaper should not be delayed. For
> > processes holding robust futexes, to avoid issue [1], the OOM reaper must
> > still be delayed.
> >
> > Patch 1 introduces process_has_robust_futex() to detect whether a process uses
> > robust futexes. Patch 2 delays the OOM reaper only for processes holding robust
> > futexes, improving OOM reaper performance. Patch 3 makes the OOM reaper and
> > exit_mmap() traverse the maple tree in opposite orders to reduce PTE lock
> > contention caused by unmapping the same vma.
>
> This all sounds sensible, given that we appear to be stuck with the
> 2-second hack.
>
> What prevents one of the process's threads from creating a robust mutex
> after we've inspected it with process_has_robust_futex()?
Thank you, I didn't consider this situation.
Since process_has_robust_futex() is called after the kill signal is sent,
this means the process will have the SIGNAL_GROUP_EXIT flag when calling
process_has_robust_futex().
We can check whether task->signal->flags contains the SIGNAL_GROUP_EXIT
flag in set_robust_list() to ensure that the process is not being killed
before creating the robust mutex.