Re: [PATCH 5/9] migrate: enable migrate_pages() to migrate hugepage

From: Naoya Horiguchi
Date: Wed Mar 20 2013 - 18:01:08 EST


On Wed, Mar 20, 2013 at 08:31:06AM +0800, Simon Jeons wrote:
...
> >>> diff --git v3.8.orig/mm/mempolicy.c v3.8/mm/mempolicy.c
> >>> index e2df1c1..8627135 100644
> >>> --- v3.8.orig/mm/mempolicy.c
> >>> +++ v3.8/mm/mempolicy.c
> >>> @@ -525,6 +525,27 @@ static int check_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
> >>> return addr != end;
> >>> }
> >>>
> >>> +static void check_hugetlb_pmd_range(struct vm_area_struct *vma, pmd_t *pmd,
> >>> + const nodemask_t *nodes, unsigned long flags,
> >>> + void *private)
> >>> +{
> >>> +#ifdef CONFIG_HUGETLB_PAGE
> >>> + int nid;
> >>> + struct page *page;
> >>> +
> >>> + spin_lock(&vma->vm_mm->page_table_lock);
> >>> + page = pte_page(huge_ptep_get((pte_t *)pmd));
> >>> + spin_unlock(&vma->vm_mm->page_table_lock);
> >> I am a bit confused why page_table_lock is used here and why it doesn't
> >> cover the page usage.
> > I expected this function to do the same for pmd as check_pte_range() does
> > for pte, but the above code didn't do it. I should've put spin_unlock
> > below migrate_hugepage_add(). Sorry for the confusion.
>
> I still confuse! Could you explain more in details?

With the above code, check_hugetlb_pmd_range() checks page_mapcount
outside the page table lock, but mapcount can be decremented by
__unmap_hugepage_range(), so there's a race.
__unmap_hugepage_range() calls page_remove_rmap() inside page table lock,
so we can avoid this race by doing whole check_hugetlb_pmd_range()'s work
inside the page table lock.

Thanks,
Naoya
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/