Re: [PATCH -v2] rmap: make anon_vma_prepare link in all theanon_vmas of a mergeable VMA

From: Peter Zijlstra
Date: Mon Apr 12 2010 - 10:44:17 EST


On Sat, 2010-04-10 at 11:21 -0700, Linus Torvalds wrote:
>

> Ho humm.
>
> Maybe I'm crazy, but something started bothering me. And I started
> wondering: when is the 'page->mapping' of an anonymous page actually
> cleared?
>
> The thing is, the mapping of an anonymous page is actually cleared only
> when the page is _freed_, in "free_hot_cold_page()".
>
> Now, let's think about that. And in particular, let's think about how that
> relates to the freeing of the 'anon_vma' that the page->mapping points to.
>
> The way the anon_vma is freed is when the mapping is torn down, and we do
> roughly:
>
> tlb = tlb_gather_mmu(mm,..)
> ..
> unmap_vmas(&tlb, vma ..
> ..
> free_pgtables()
> ..
> tlb_finish_mmu(tlb, start, end);
>
> and we actually unmap all the pages in "unmap_vmas()", and then _after_
> unmapping all the pages we do the "unlink_anon_vmas(vma);" in
> "free_pgtables()". Fine so far - the anon_vma stay around until after the
> page has been happily unmapped.
>
> But "unmapped all the pages" is _not_ actually the same as "free'd all the
> pages". The actual _freeing_ of the page happens generally in
> tlb_finish_mmu(), because we can free the page only after we've flushed
> any TLB entries.
>
> So what we have in that tlb_gather structure is a list of _pending_ pages
> to be freed, while we already actually free'd the anon_vmas earlier!
>
> Now, the thing is, tlb_gather_mmu() begins a preempt-safe region (because
> we use a per-cpu variable), but as far as I can tell it is _not_ an
> RCU-safe region.
>
> So I think we might actually get a real RCU freeing event while this all
> happens. So now the 'anon_vma' that 'page->mapping' points to has not just
> been released back to the SLUB caches, the page itself might have been
> released too.
>
> I dunno. Does the above sound at all sane? Or am I just raving?
>
> Something hacky like the above might fix it if I'm not just raving. I
> really might be missing something here.

Right, so unless you have CONFIG_TREE_PREEMPT_RCU=y, the preempt-disable
== RCU read lock assumption does hold.

But even with your patch it doesn't close all holes because while
zap_pte_range() can remove the last mapcount of the page, the
page_remove_tlb() et al. don't need to be the last use count of the
page.

Concurrent reclaim/gup/whatever could still have a count out on the page
delaying the actual free beyond the tlb gather RCU section.

So the reason page->mapping isn't cleared in page_remove_rmap() isn't
detailed beyond a (possible) race with page_add_anon_rmap() (which I
guess would be reclaim trying to unmap the page and a fault re-instating
it).

This also complicates the whole page_lock_anon_vma() thing, so it would
be nice to be able to remove this race and clear page->mapping in
page_remove_rmap().
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/