Re: [BUG 3.0.0-rc1] ksm: NULL pointer dereference in ksm_do_scan()

From: Hugh Dickins
Date: Thu Jun 02 2011 - 13:30:33 EST


On Thu, 2 Jun 2011, Chris Wright wrote:
> * Andrea Righi (andrea@xxxxxxxxxxxxxxx) wrote:
> > mmh.. I can reproduce the bug also with the standard ubuntu (11.04)
> > kernel. Could you post your .config?
>
> Andrea (Righi), can you tell me if this WARN fires? This looks
> like a pure race between removing from list and checking list, i.e.
> insufficient locking.
>
> ksm_scan.mm_slot == the only registered mm
>
> CPU 1 (bug program) CPU 2 (ksmd)
> list_empty() is false
> lock
> ksm_scan.mm_slot
> list_del
> unlock
> slot == &ksm_mm_head (but list is now empty_)
>
>
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 942dfc7..ab79a92 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -1301,6 +1301,7 @@ static struct rmap_item *scan_get_next_rmap_item(struct page **page)
> slot = list_entry(slot->mm_list.next, struct mm_slot, mm_list);
> ksm_scan.mm_slot = slot;
> spin_unlock(&ksm_mmlist_lock);
> + WARN_ON(slot == &ksm_mm_head);
> next_mm:
> ksm_scan.address = 0;
> ksm_scan.rmap_list = &slot->rmap_list;

AndreaR, good find, many thanks for discovering and reporting it.
I couldn't look at it until last night, and even then, it was not
obvious to me exactly where my assumptions were going wrong.

Even now it's unclear what role the SIGSEGV plays, as opposed to an
normal exit: I guess it just happens to change the timing enough to
make the race dangerous.

Your patch was not wrong, but I do prefer a patch that plugs the
exact hole; and I needed to understand what was going on - without
understanding it, there was a danger we might leak memory instead.

AndreaA, I didn't study the patch you posted half an hour ago,
since by that time I'd worked it out and was preparing patch below.
I think your patch would be for a different bug, hopefully one we
don't have, it looks more complicated than we should need for this.

ChrisW, yes, your WARN_ON is spot on, matches what I saw exactly.

I'll fill in the patch description later, must dash now, probably
offline until late tonight. Or if you're satisfied and don't want to
wait, you guys fill that in and send off to Linus & Andrew - thanks.

[PATCH] ksm: fix easily reproduced NULL pointer dereference

Reported-by: Andrea Righi <andrea@xxxxxxxxxxxxxxx>
Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx>
Cc: stable@xxxxxxxxxx
---

mm/ksm.c | 7 +++++++
1 file changed, 7 insertions(+)

--- 3.0-rc1/mm/ksm.c 2011-05-29 18:42:37.429882601 -0700
+++ linux/mm/ksm.c 2011-06-02 09:55:31.729702490 -0700
@@ -1302,6 +1302,13 @@ static struct rmap_item *scan_get_next_r
slot = list_entry(slot->mm_list.next, struct mm_slot, mm_list);
ksm_scan.mm_slot = slot;
spin_unlock(&ksm_mmlist_lock);
+
+ /*
+ * Although we tested list_empty() above, a racing __ksm_exit
+ * of the last mm on the list may have removed it since then.
+ */
+ if (slot == &ksm_mm_head)
+ return NULL;
next_mm:
ksm_scan.address = 0;
ksm_scan.rmap_list = &slot->rmap_list;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/