[RFC 0/2] opportunistic memory reclaim of a killed process

From: Suren Baghdasaryan
Date: Wed Apr 10 2019 - 21:44:04 EST


The time to kill a process and free its memory can be critical when the
killing was done to prevent memory shortages affecting system
responsiveness.

In the case of Android, where processes can be restarted easily, killing a
less important background process is preferred to delaying or throttling
an interactive foreground process. At the same time unnecessary kills
should be avoided as they cause delays when the killed process is needed
again. This requires a balanced decision from the system software about
how long a kill can be postponed in the hope that memory usage will
decrease without such drastic measures.

As killing a process and reclaiming its memory is not an instant operation,
a margin of free memory has to be maintained to prevent system performance
deterioration while memory of the killed process is being reclaimed. The
size of this margin depends on the minimum reclaim rate to cover the
worst-case scenario and this minimum rate should be deterministic.

Note that on asymmetric architectures like ARM big.LITTLE the reclaim rate
can vary dramatically depending on which core itâs performed at (see test
results). Itâs a usual scenario that a non-essential victim process is
being restricted to a less performant or throttled CPU for power saving
purposes. This makes the worst-case reclaim rate scenario very probable.

The cases when victimâs memory reclaim can be delayed further due to
process being blocked in an uninterruptible sleep or when it performs a
time-consuming operation makes the reclaim time even more unpredictable.

Increasing memory reclaim rate and making it more deterministic would
allow for a smaller free memory margin and would lead to more opportunities
to avoid killing a process.

Note that while other strategies like throttling memory allocations are
viable and can be employed for other non-essential processes they would
affect user experience if applied towards an interactive process.

Proposed solution uses existing oom-reaper thread to increase memory
reclaim rate of a killed process and to make this rate more deterministic.
By no means the proposed solution is considered the best and was chosen
because it was simple to implement and allowed for test data collection.
The downside of this solution is that it requires additional âexpediteâ
hint for something which has to be fast in all cases. Would be great to
find a way that does not require additional hints.

Other possible approaches include:
- Implementing a dedicated syscall to perform opportunistic reclaim in the
context of the process waiting for the victimâs death. A natural boost
bonus occurs if the waiting process has high or RT priority and is not
limited by cpuset cgroup in its CPU choices.
- Implement a mechanism that would perform opportunistic reclaim if itâs
possible unconditionally (similar to checks in task_will_free_mem()).
- Implement opportunistic reclaim that uses shrinker interface, PSI or
other memory pressure indications as a hint to engage.

Test details:
Tests are performed on a Qualcomm Snapdragonâ 845 8-core ARM big.LITTLE
system with 4 little cores (0.3-1.6GHz) and 4 big cores (0.8-2.5GHz)
running Android.
Memory reclaim speed was measured using signal/signal_generate,
kmem/rss_stat and sched/sched_process_exit traces.

Test results:
powersave governor, min freq
normal kills expedited kills
little 856 MB/sec 3236 MB/sec
big 5084 MB/sec 6144 MB/sec

performance governor, max freq
normal kills expedited kills
little 5602 MB/sec 8144 MB/sec
big 14656 MB/sec 12398 MB/sec

schedutil governor (default)
normal kills expedited kills
little 2386 MB/sec 3908 MB/sec
big 7282 MB/sec 6820-16386 MB/sec
=================================================================
min reclaim speed: 856 MB/sec 3236 MB/sec

The patches are based on 5.1-rc1

Suren Baghdasaryan (2):
mm: oom: expose expedite_reclaim to use oom_reaper outside of
oom_kill.c
signal: extend pidfd_send_signal() to allow expedited process killing

include/linux/oom.h | 1 +
include/linux/sched/signal.h | 3 ++-
include/linux/signal.h | 11 ++++++++++-
ipc/mqueue.c | 2 +-
kernel/signal.c | 37 ++++++++++++++++++++++++++++--------
kernel/time/itimer.c | 2 +-
mm/oom_kill.c | 15 +++++++++++++++
7 files changed, 59 insertions(+), 12 deletions(-)

--
2.21.0.392.gf8f6787159e-goog