[PATCH] oom, oom_reaper: allow to reap mm shared by the kthreads

From: Michal Hocko
Date: Fri Jun 10 2016 - 10:27:49 EST


oom reaper was skipped for an mm which is shared with the kernel thread
(aka use_mm()). The primary concern was that such a kthread might want
to read from the userspace memory and see zero page as a result of the
oom reaper action. This seems to be overly conservative because none of
the current use_mm() users need to do copy_from_user or get_user. aio
code used to rely on copy_from_user but this is long gone along with
use_mm() usage in fs/aio.c.

We currently have only 3 users in the kernel:
- ffs_user_copy_worker, ep_user_copy_worker only do copy_to_iter()
- vhost_worker only copies over to the userspace as well AFAICS

In fact relying on copy_from_user in the kernel thread context is quite
dubious because it expects an active cooperation from the userspace to
have a consistent data (e.g. userspace can do MADV_DONTNEED as well).

Add a note to use_mm about the copy_from_user risk and allow the oom
killer to invoke the oom_reaper for mms shared with kthreads. This will
practically cause all the sane use cases to be reapable.

Signed-off-by: Michal Hocko <mhocko@xxxxxxxx>
---
mm/mmu_context.c | 5 +++++
mm/oom_kill.c | 14 +++++++-------
2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/mm/mmu_context.c b/mm/mmu_context.c
index f802c2d216a7..27449747f8de 100644
--- a/mm/mmu_context.c
+++ b/mm/mmu_context.c
@@ -16,6 +16,11 @@
* mm context.
* (Note: this routine is intended to be called only
* from a kernel thread context)
+ *
+ * Do not use copy_from_user from this context because the
+ * address space might got reclaimed behind the back by
+ * the oom_reaper so an unexpected zero page might be
+ * encountered.
*/
void use_mm(struct mm_struct *mm)
{
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 6303bc7caeda..b6a7027643b6 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -921,13 +921,7 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
continue;
if (same_thread_group(p, victim))
continue;
- if (unlikely(p->flags & PF_KTHREAD) || is_global_init(p)) {
- /*
- * We cannot use oom_reaper for the mm shared by this
- * process because it wouldn't get killed and so the
- * memory might be still used. Hide the mm from the oom
- * killer to guarantee OOM forward progress.
- */
+ if (is_global_init(p)) {
can_oom_reap = false;
set_bit(MMF_OOM_REAPED, &mm->flags);
pr_info("oom killer %d (%s) has mm pinned by %d (%s)\n",
@@ -935,6 +929,12 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
task_pid_nr(p), p->comm);
continue;
}
+ /*
+ * No use_mm() user needs to read from the userspace so we are
+ * ok to reap it.
+ */
+ if (unlikely(p->flags & PF_KTHREAD))
+ continue;
do_send_sig_info(SIGKILL, SEND_SIG_FORCED, p, true);
}
rcu_read_unlock();
--
2.8.1

--
Michal Hocko
SUSE Labs