Re: [PATCH 2/2] mm: rmap: Move the cache flushing to the correct place for hugetlb PMD sharing

From: Mike Kravetz
Date: Mon Apr 25 2022 - 20:20:29 EST


On 4/24/22 07:50, Baolin Wang wrote:
> The cache level flush will always be first when changing an existing
> virtual–>physical mapping to a new value, since this allows us to
> properly handle systems whose caches are strict and require a
> virtual–>physical translation to exist for a virtual address. So we
> should move the cache flushing before huge_pmd_unshare().
>
> As Muchun pointed out[1], now the architectures whose supporting hugetlb
> PMD sharing have no cache flush issues in practice. But I think we
> should still follow the cache/TLB flushing rules when changing a valid
> virtual address mapping in case of potential issues in future.
>
> [1] https://lore.kernel.org/all/YmT%2F%2FhuUbFX+KHcy@xxxxxxxxxxxxxxxxxxxxx/
> Signed-off-by: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx>
> ---
> mm/rmap.c | 40 ++++++++++++++++++++++------------------
> 1 file changed, 22 insertions(+), 18 deletions(-)
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 61e63db..81872bb 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1535,15 +1535,16 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
> * do this outside rmap routines.
> */
> VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
> + /*
> + * huge_pmd_unshare unmapped an entire PMD page.

Perhaps update this comment to say that huge_pmd_unshare 'may' unmap
an entire PMD page?

> + * There is no way of knowing exactly which PMDs may
> + * be cached for this mm, so we must flush them all.
> + * start/end were already adjusted above to cover this
> + * range.
> + */
> + flush_cache_range(vma, range.start, range.end);
> +
> if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) {
> - /*
> - * huge_pmd_unshare unmapped an entire PMD
> - * page. There is no way of knowing exactly
> - * which PMDs may be cached for this mm, so
> - * we must flush them all. start/end were
> - * already adjusted above to cover this range.
> - */
> - flush_cache_range(vma, range.start, range.end);
> flush_tlb_range(vma, range.start, range.end);
> mmu_notifier_invalidate_range(mm, range.start,
> range.end);
> @@ -1560,13 +1561,14 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
> page_vma_mapped_walk_done(&pvmw);
> break;
> }
> + } else {
> + flush_cache_page(vma, address, pte_pfn(*pvmw.pte));

I know this call to flush_cache_page() existed before your change. But, when
looking at this now I wonder how hugetlb pages are handled? Are there any
versions of flush_cache_page() that take page size into account?

--
Mike Kravetz

> }
>
> /*
> * Nuke the page table entry. When having to clear
> * PageAnonExclusive(), we always have to flush.
> */
> - flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
> if (should_defer_flush(mm, flags) && !anon_exclusive) {
> /*
> * We clear the PTE but do not flush so potentially
> @@ -1890,15 +1892,16 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
> * do this outside rmap routines.
> */
> VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
> + /*
> + * huge_pmd_unshare unmapped an entire PMD page.
> + * There is no way of knowing exactly which PMDs may
> + * be cached for this mm, so we must flush them all.
> + * start/end were already adjusted above to cover this
> + * range.
> + */
> + flush_cache_range(vma, range.start, range.end);
> +
> if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) {
> - /*
> - * huge_pmd_unshare unmapped an entire PMD
> - * page. There is no way of knowing exactly
> - * which PMDs may be cached for this mm, so
> - * we must flush them all. start/end were
> - * already adjusted above to cover this range.
> - */
> - flush_cache_range(vma, range.start, range.end);
> flush_tlb_range(vma, range.start, range.end);
> mmu_notifier_invalidate_range(mm, range.start,
> range.end);
> @@ -1915,10 +1918,11 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
> page_vma_mapped_walk_done(&pvmw);
> break;
> }
> + } else {
> + flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
> }
>
> /* Nuke the page table entry. */
> - flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
> pteval = ptep_clear_flush(vma, address, pvmw.pte);
>
> /* Set the dirty flag on the folio now the pte is gone. */