[PATCH 13/12] ksm: fix munlock during exit_mmap deadlock

From: Andrea Arcangeli
Date: Tue Aug 25 2009 - 11:26:17 EST

From: Andrea Arcangeli <aarcange@xxxxxxxxxx>

We can't stop page faults from happening during exit_mmap or munlock
fails. The fundamental issue is the absolute lack of serialization
after mm_users reaches 0. mmap_sem should be hot in the cache as we
just released it a few nanoseconds before in exit_mm, we just need to
take it one last time after mm_users is 0 to allow drivers to
serialize safely against it so that taking mmap_sem and checking
mm_users > 0 is enough for ksm to serialize against exit_mmap while
still noticing when oom killer or something else wants to release all
memory of the mm. When ksm notices it bails out and it allows memory
to be released.

Signed-off-by: Andrea Arcangeli <aarcange@xxxxxxxxxx>

diff --git a/kernel/fork.c b/kernel/fork.c
index 9a16c21..f5af0d3 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -515,7 +515,18 @@ void mmput(struct mm_struct *mm)

if (atomic_dec_and_test(&mm->mm_users)) {
+ /*
+ * Allow drivers tracking mm without pinning mm_users
+ * (so that mm_users is allowed to reach 0 while they
+ * do their tracking) to serialize against exit_mmap
+ * by taking mmap_sem and checking mm_users is still >
+ * 0 before working on the mm they're tracking.
+ */
+ down_read(&mm->mmap_sem);
+ up_read(&mm->mmap_sem);
set_mm_exe_file(mm, NULL);
if (!list_empty(&mm->mmlist)) {
diff --git a/mm/memory.c b/mm/memory.c
index 4a2c60d..025431e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2603,7 +2603,7 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
entry = maybe_mkwrite(pte_mkdirty(entry), vma);

page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
- if (!pte_none(*page_table) || ksm_test_exit(mm))
+ if (!pte_none(*page_table))
goto release;

inc_mm_counter(mm, anon_rss);
@@ -2753,7 +2753,7 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma,
* handle that later.
/* Only go through if we didn't race with anybody else... */
- if (likely(pte_same(*page_table, orig_pte) && !ksm_test_exit(mm))) {
+ if (likely(pte_same(*page_table, orig_pte))) {
flush_icache_page(vma, page);
entry = mk_pte(page, vma->vm_page_prot);
if (flags & FAULT_FLAG_WRITE)
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/