Re: [PATCH 0/3] ksm: write protect pages from inside ksm

From: Hugh Dickins
Date: Mon Jun 15 2009 - 08:24:53 EST


On Mon, 15 Jun 2009, Izik Eidus wrote:
> On Mon, 15 Jun 2009 03:05:14 +0300 Izik Eidus <ieidus@xxxxxxxxxx> wrote:
> > >>
> > >> Sure, let me check it.
> > >> (You do have Andrea patch that fix the "used after free slab
> > >> entries" ?)

I do, yes, and the other fix he posted at the same time.

> > >
> > > How fast is it crush opps to you?, I compiled it and ran it here on
> > > 2.6.30-rc4-mm1 with:
> > > "Enable SLQB debugging support" and "SLQB debugging on by default,
> > > and it run and merge (i am using qemu processes to run virtual
> > > machines to merge the pages between them)

I suspect too much of your testing has been on the sensible use of KSM,
not enough on foolish abuse of it!

> > >
> > > ("SLQB debugging on by defaul" mean i dont have to add boot
> > > pareameter right?)

That's right. (I never switch it on by default in the .config,
just love that we can switch it on by boot option when needed.)

> > OK, bug on my side, just got that oppss, will try to fix and send
> > patch.

Thanks a lot for getting there so quickly.

> Ok, below is ugly fix for the opss..

Right, for several reasons* not a patch we'd like to include in the
end, but enough to show that indeed fixes the problems I was seeing:
well caught.

[* We don't want to serialize everyone on a global lock there. You
could make it a little better using atomic_dec_and_lock(), but still
unpleasant. And that placing of the list_del() is bad for swapping:
I ended up using list_del_init() there, and restoring the check after
calling set_mm_exe_file(). But never mind, I've a better fix.]

>
> From 3be1ad5a9f990113e8849fa1e74c4e74066af131 Mon Sep 17 00:00:00 2001
> From: Izik Eidus <ieidus@xxxxxxxxxx>
> Date: Mon, 15 Jun 2009 03:52:05 +0300
> Subject: [PATCH] ksm: madvise-rfc: really ugly fix for the oppss bug.
>
> This patch is just so it can run without to crush with the madvise rfc patch.
>
> True fix for this i think is adding another list for ksm inside the mm struct.
> In the meanwhile i will try to think about other way how to fix this bug.

I'm already working with a separate list for KSM inside the mm struct
(prefer not to be mixing these two things up), but I don't see how that
helps.

>
> Hugh, i hope at least now you will be able to run it without it will crush to
> you.

Yes, thanks, and helped me to think of a better fix (I hope!) below...

>
> Signed-off-by: Izik Eidus <ieidus@xxxxxxxxxx>
> ---
> kernel/fork.c | 11 ++++++-----
> 1 files changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/fork.c b/kernel/fork.c
> index e5ef58c..771b89a 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -484,17 +484,18 @@ void mmput(struct mm_struct *mm)
> {
> might_sleep();
>
> + spin_lock(&mmlist_lock);
> if (atomic_dec_and_test(&mm->mm_users)) {
> + if (!list_empty(&mm->mmlist))
> + list_del(&mm->mmlist);
> + spin_unlock(&mmlist_lock);
> exit_aio(mm);
> exit_mmap(mm);
> set_mm_exe_file(mm, NULL);
> - if (!list_empty(&mm->mmlist)) {
> - spin_lock(&mmlist_lock);
> - list_del(&mm->mmlist);
> - spin_unlock(&mmlist_lock);
> - }
> put_swap_token(mm);
> mmdrop(mm);
> + } else {
> + spin_unlock(&mmlist_lock);
> }
> }
> EXPORT_SYMBOL_GPL(mmput);

I believe you can restore mmput() to how it was: patch below appears
to fix it from the get_next_mmlist() end, using atomic_inc_not_zero() -
swapoff's try_to_unuse() has to use that on mm_users too. But please
check and test it carefully, I'd usually want to test a lot more myself
before sending out.

And, hey, this very same patch appears to fix the other issue I mentioned,
that I wasn't yet seeing any merging (within a single mm): I think your
mm_users 1 test was badly placed before (whether or not we already have
an mm_slot makes a big difference to the meaning of mm_users 1).

Signed-off-by: Hugh Dickins <hugh.dickins@xxxxxxxxxxxxx>

--- ksm_madvise/mm/ksm.c 2009-06-14 11:15:33.000000000 +0100
+++ linux/mm/ksm.c 2009-06-15 12:49:12.000000000 +0100
@@ -1137,7 +1137,6 @@ static void insert_to_mm_slots_hash(stru
bucket = &mm_slots_hash[((unsigned long)mm / sizeof (struct mm_struct))
% nmm_slots_hash];
mm_slot->mm = mm;
- atomic_inc(&mm_slot->mm->mm_users);
INIT_LIST_HEAD(&mm_slot->rmap_list);
hlist_add_head(&mm_slot->link, bucket);
}
@@ -1253,21 +1252,21 @@ static struct mm_slot *get_next_mmlist(s
if (test_bit(MMF_VM_MERGEABLE, &mm->flags) ||
test_bit(MMF_VM_MERGEALL, &mm->flags)) {
mm_slot = get_mm_slot(mm);
- if (unlikely(atomic_read(&mm->mm_users) == 1)) {
- if (mm_slot)
+ if (mm_slot) {
+ if (atomic_read(&mm->mm_users) == 1) {
mm_slot->touched = 0;
- } else {
- if (!mm_slot) {
- insert_to_mm_slots_hash(mm,
- pre_alloc_mm_slot);
- *used_pre_alloc = 1;
- mm_slot = pre_alloc_mm_slot;
+ goto next;
}
- mm_slot->touched = 1;
- return mm_slot;
- }
+ } else if (atomic_inc_not_zero(&mm->mm_users)) {
+ insert_to_mm_slots_hash(mm, pre_alloc_mm_slot);
+ *used_pre_alloc = 1;
+ mm_slot = pre_alloc_mm_slot;
+ } else
+ goto next;
+ mm_slot->touched = 1;
+ return mm_slot;
}
-
+next:
cur = cur->next;
}
return NULL;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/