Re: [RFC PATCH] mm/thp: Use new function to clear pmd before THP splitting

From: Kirill A. Shutemov
Date: Tue May 05 2015 - 20:11:47 EST


On Mon, May 04, 2015 at 10:59:16PM +0530, Aneesh Kumar K.V wrote:
> Archs like ppc64 require pte_t * to remain stable in some code path.
> They use local_irq_disable to prevent a parallel split. Generic code
> clear pmd instead of marking it _PAGE_SPLITTING in code path
> where we can afford to mark pmd none before splitting. Use a
> variant of pmdp_splitting_clear_notify that arch can override.
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@xxxxxxxxxxxxxxxxxx>

Sorry, I still try wrap my head around this problem.

So, Power has __find_linux_pte_or_hugepte() which does lock-less lookup in
page tables with local interrupts disabled. For huge pages it casts pmd_t
to pte_t. Since format of pte_t is different from pmd_t we want to prevent
transit from pmd pointing to page table to pmd pinging to huge page (and
back) while interrupts are disabled.

The complication for Power is that it doesn't do implicit IPI on tlb
flush.

Is it correct?

For THP, split_huge_page() and collapse sides are covered. This patch
should address two cases of splitting PMD, but not compound page in
current upstream.

But I think there's still *big* problem for Power -- zap_huge_pmd().

For instance: other CPU can shoot out a THP PMD with MADV_DONTNEED and
fault in small pages instead. IIUC, for __find_linux_pte_or_hugepte(),
it's equivalent of splitting.

I don't see how this can be fixed without kick_all_cpus_sync() in all
pmdp_clear_flush() on Power.

--
Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/