Re: [PATCH 2/2] mm: rmap: Move the cache flushing to the correct place for hugetlb PMD sharing

From: Baolin Wang
Date: Tue Apr 26 2022 - 02:26:02 EST




On 4/26/2022 8:20 AM, Mike Kravetz wrote:
On 4/24/22 07:50, Baolin Wang wrote:
The cache level flush will always be first when changing an existing
virtual–>physical mapping to a new value, since this allows us to
properly handle systems whose caches are strict and require a
virtual–>physical translation to exist for a virtual address. So we
should move the cache flushing before huge_pmd_unshare().

As Muchun pointed out[1], now the architectures whose supporting hugetlb
PMD sharing have no cache flush issues in practice. But I think we
should still follow the cache/TLB flushing rules when changing a valid
virtual address mapping in case of potential issues in future.

[1] https://lore.kernel.org/all/YmT%2F%2FhuUbFX+KHcy@xxxxxxxxxxxxxxxxxxxxx/
Signed-off-by: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx>
---
mm/rmap.c | 40 ++++++++++++++++++++++------------------
1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 61e63db..81872bb 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1535,15 +1535,16 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
* do this outside rmap routines.
*/
VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
+ /*
+ * huge_pmd_unshare unmapped an entire PMD page.

Perhaps update this comment to say that huge_pmd_unshare 'may' unmap
an entire PMD page?

Sure, will do.


+ * There is no way of knowing exactly which PMDs may
+ * be cached for this mm, so we must flush them all.
+ * start/end were already adjusted above to cover this
+ * range.
+ */
+ flush_cache_range(vma, range.start, range.end);
+
if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) {
- /*
- * huge_pmd_unshare unmapped an entire PMD
- * page. There is no way of knowing exactly
- * which PMDs may be cached for this mm, so
- * we must flush them all. start/end were
- * already adjusted above to cover this range.
- */
- flush_cache_range(vma, range.start, range.end);
flush_tlb_range(vma, range.start, range.end);
mmu_notifier_invalidate_range(mm, range.start,
range.end);
@@ -1560,13 +1561,14 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
page_vma_mapped_walk_done(&pvmw);
break;
}
+ } else {
+ flush_cache_page(vma, address, pte_pfn(*pvmw.pte));

I know this call to flush_cache_page() existed before your change. But, when
looking at this now I wonder how hugetlb pages are handled? Are there any
versions of flush_cache_page() that take page size into account?

Thanks for reminding. I checked the flush_cache_page() implementation on some architectures (like arm32), they did not consider the hugetlb pages, so I think we may miss flushing the whole cache for hguetlb pages on some architectures.

With this patch, we can mitigate this issue, since we change to use flush_cache_range() to cover the possible range to flush cache for hugetlb pages. Bur for anon hugetlb pages, we should also convert to use
flush_cache_range() instead. I think we can do this conversion in a separate patch set with checking all the places, where using flush_cache_page() to flush cache for hugetlb pages. How do you think?