Re: [PATCH 02/13] mm: Revalidate anon_vma in page_lock_anon_vma()

From: Andrew Morton
Date: Thu Apr 08 2010 - 17:22:09 EST


On Thu, 08 Apr 2010 21:17:39 +0200
Peter Zijlstra <a.p.zijlstra@xxxxxxxxx> wrote:

> There is nothing preventing the anon_vma from being detached while we
> are spinning to acquire the lock.

Well. The comment there clearly implies (or states) that RCU
protection is used to "guard against races". If that's inaccurate
or incomplete, can we please get it fixed?


The whole function makes be a bit queasy.

- Fails to explain why it pulls all these party tricks to read
page->mapping a single time. What code path are we defending against
here?

- Then checks page_mapped() without having any apparent defence
against page_mapped() becoming untrue one nanosecond later.

- Checks page_mapped() inside the rcu_read_locked() section for
inscrutable reasons.

> Most (all?) current users end up
> calling something like vma_address(page, vma) on it, which has a
> fairly good chance of weeding out wonky vmas.
>
> However suppose the anon_vma got freed and re-used while we were
> waiting to acquire the lock, and the new anon_vma fits with the
> page->index (because that is the only thing vma_address() uses to
> determine if the page fits in a particular vma, we could end up
> traversing faulty anon_vma chains.
>
> Close this hole for good by re-validating that page->mapping still
> holds the very same anon_vma pointer after we acquire the lock, if not
> be utterly paranoid and retry the whole operation (which will very
> likely bail, because it's unlikely the page got attached to a different
> anon_vma in the meantime).
>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
> Cc: Hugh Dickins <hugh.dickins@xxxxxxxxxxxxx>
> Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> ---
> mm/rmap.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> Index: linux-2.6/mm/rmap.c
> ===================================================================
> --- linux-2.6.orig/mm/rmap.c
> +++ linux-2.6/mm/rmap.c
> @@ -294,6 +294,7 @@ struct anon_vma *page_lock_anon_vma(stru
> unsigned long anon_mapping;
>
> rcu_read_lock();
> +again:
> anon_mapping = (unsigned long) ACCESS_ONCE(page->mapping);
> if ((anon_mapping & PAGE_MAPPING_FLAGS) != PAGE_MAPPING_ANON)
> goto out;
> @@ -302,6 +303,12 @@ struct anon_vma *page_lock_anon_vma(stru
>
> anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
> spin_lock(&anon_vma->lock);
> +
> + if (page_rmapping(page) != anon_vma) {
> + spin_unlock(&anon_vma->lock);
> + goto again;
> + }
> +
> return anon_vma;
> out:
> rcu_read_unlock();
>

A comment here explaining how this situation could come about would
be helpful.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/