Re: [PATCH -v2] rmap: make anon_vma_prepare link in all theanon_vmas of a mergeable VMA

From: Peter Zijlstra
Date: Mon Apr 12 2010 - 12:02:08 EST


On Mon, 2010-04-12 at 11:19 -0400, Rik van Riel wrote:
> On 04/12/2010 10:40 AM, Peter Zijlstra wrote:
>
> > So the reason page->mapping isn't cleared in page_remove_rmap() isn't
> > detailed beyond a (possible) race with page_add_anon_rmap() (which I
> > guess would be reclaim trying to unmap the page and a fault re-instating
> > it).
> >
> > This also complicates the whole page_lock_anon_vma() thing, so it would
> > be nice to be able to remove this race and clear page->mapping in
> > page_remove_rmap().
>
> For anonymous pages, I don't see where the race comes from.
>
> Both do_swap_page and the reclaim code hold the page lock
> across the entire operation, so they are already excluding
> each other.
>
> Hugh, do you remember what the race between page_remove_rmap
> and page_add_anon_rmap is/was all about?
>
> I don't see a race in the current code...


Something like the below would be nice if possible.


---
mm/rmap.c | 44 +++++++++++++++++++++++++++++++-------------
1 files changed, 31 insertions(+), 13 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index eaa7a09..241f75d 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -286,7 +286,22 @@ void __init anon_vma_init(void)

/*
* Getting a lock on a stable anon_vma from a page off the LRU is
- * tricky: page_lock_anon_vma rely on RCU to guard against the races.
+ * tricky:
+ *
+ * page_add_anon_vma()
+ * atomic_add_negative(page->_mapcount);
+ * page->mapping = anon_vma;
+ *
+ *
+ * page_remove_rmap()
+ * atomic_add_negative();
+ * page->mapping = anon_vma;
+ *
+ * So we have to first read page->mapping(), and then verify
+ * _mapcount, and make sure we order them correctly.
+ *
+ * We take anon_vma->lock in between so that if we see the anon_vma
+ * with a mapcount we know it won't go away on us.
*/
struct anon_vma *page_lock_anon_vma(struct page *page)
{
@@ -294,14 +309,24 @@ struct anon_vma *page_lock_anon_vma(struct page *page)
unsigned long anon_mapping;

rcu_read_lock();
- anon_mapping = (unsigned long) ACCESS_ONCE(page->mapping);
+ anon_mapping = (unsigned long)rcu_dereference(page->mapping);
if ((anon_mapping & PAGE_MAPPING_FLAGS) != PAGE_MAPPING_ANON)
goto out;
- if (!page_mapped(page))
- goto out;

anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
spin_lock(&anon_vma->lock);
+
+ /*
+ * Order the reading of page->mapping and page->_mapcount against the
+ * mb() implied by the atomic_add_negative() in page_remove_rmap().
+ */
+ smp_rmb();
+ if (!page_mapped(page)) {
+ spin_unlock(&anon_vma->lock);
+ anon_vma = NULL;
+ goto out;
+ }
+
return anon_vma;
out:
rcu_read_unlock();
@@ -864,15 +889,8 @@ void page_remove_rmap(struct page *page)
__dec_zone_page_state(page, NR_FILE_MAPPED);
mem_cgroup_update_file_mapped(page, -1);
}
- /*
- * It would be tidy to reset the PageAnon mapping here,
- * but that might overwrite a racing page_add_anon_rmap
- * which increments mapcount after us but sets mapping
- * before us: so leave the reset to free_hot_cold_page,
- * and remember that it's only reliable while mapped.
- * Leaving it set also helps swapoff to reinstate ptes
- * faster for those pages still in swapcache.
- */
+
+ page->mapping = NULL;
}

/*


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/