Re: mm: fix BUG in __split_huge_page_pmd

From: Andrea Arcangeli
Date: Tue Oct 15 2013 - 10:41:43 EST


On Tue, Oct 15, 2013 at 02:32:54PM +0300, Kirill A. Shutemov wrote:
> Hugh Dickins wrote:
> > Occasionally we hit the BUG_ON(pmd_trans_huge(*pmd)) at the end of
> > __split_huge_page_pmd(): seen when doing madvise(,,MADV_DONTNEED).
> >
> > It's invalid: we don't always have down_write of mmap_sem there:
> > a racing do_huge_pmd_wp_page() might have copied-on-write to another
> > huge page before our split_huge_page() got the anon_vma lock.
> >
> > Forget the BUG_ON, just go back and try again if this happens.
> >
> > Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx>
> > Cc: stable@xxxxxxxxxxxxxxx
>
> Looks reasonable to me.
>
> Acked-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
>
> madvise(MADV_DONTNEED) was aproblematic with THP before. Is a big win having
> mmap_sem taken on read rather than on write for it?

Yeah it caused all those pmd_trans_unstable and
pmd_none_or_trans_huge_or_clear_bad and pmd_read_atomic in common
code. But I didn't want to regress the scalability of
MADV_DONTNEED... I think various apps use MADV_DONTNEED to free memory
(including very KVM in the balloon driver and probably JVM and other JIT).

none or huge pmds are unstable without mmap_sem for writing and
without page_table_lock (or in general pmd_trans_huge_lock).

It's identical to the pte being unstable if mmap_sem is held for
reading and we don't hold the PT lock, except the pte can only have
two states and they're both unstable.

hugepmds have three states, and the only stable state of the tree is
when it points to a regular pte (the third state that 4k ptes cannot have).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/