Re: [PATCH v2 1/1] mm: fix the theoretical compound_lock() vsprep_new_page() race

From: Oleg Nesterov
Date: Wed Jan 08 2014 - 14:04:54 EST


On 01/08, Mel Gorman wrote:
>
> On Wed, Jan 08, 2014 at 05:13:38PM +0100, Oleg Nesterov wrote:
> >
> > Yes. But, for example, get_futex_key() does
> >
> > if (unlikely(PageTail(page))) {
> > put_page(page);
> >
> > why this put_page() can't race with _split? If nothing else, another thread
> > can unmap the part of this vma.
> >
>
> The race is not prevented but that does not mean it matters. Basic
> scenario where a split starts after the PageTail check but before the
> put_page in get_futex_key
>
> CPU A
> get_futex_key
> -> fast gup, page table removing prevents parallel unmap and free
> -> gup_huge_pmd (arch/x86/mm/gup.c at least)
> -> get_huge_page_tail (increment page tail _map_count)
> -> get_huge_page_multiple (increment ref on head page)
> -> Check PageTail
> CPU B
> split_huge_page_to_list
> -> split_huge_page_refcount
> spin_lock_irq(lru_lock)
> compound_lock
> -> put_page(tail_page)
> ->put_compound_page
> looks up head page

Yes.

But suppose that CPU B completes split_huge_page_to_list/munmap/etc
and frees this head page.

> takes reference unless zero

suppose this page_head was reallocated and get_page_unless_zero()
succeds right after set_page_refcounted(),

> compound_lock (block)
> complete split
> compound_unlock
> check PageTail
>
> This put_page blocks on the compound lock, finds the page is no longer a
> PageTail

Sure. The problem is that compound_lock() itself can race with prep_new_page()
or I missed something.

> The parallel unmap is prevented by get_huge_page_multiple in the gup path
> and held in place until put_page_compound frees it later.

Again, I can easily miss something. And yes, page_tail returned by gup
has a reference to its page_head (via page_head->_count). But
__split_huge_page_refcount() destroys this connection and decrements
page_head->_count.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/