[PATCH v2] mm/hugetlb: Fix uffd wr-protection for CoW optimization path

From: Peter Xu
Date: Tue Mar 21 2023 - 14:58:42 EST


This patch fixes an issue that a hugetlb uffd-wr-protected mapping can be
writable even with uffd-wp bit set. It only happens with hugetlb private
mappings, when someone firstly wr-protects a missing pte (which will
install a pte marker), followed by a write to the page. That will trigger
a missing fault and an optimized CoW in the same fault stack.

Userfaultfd-wp trap for hugetlb was implemented in hugetlb_fault() before
even reaching hugetlb_wp() to avoid taking more locks that userfault won't
need. However there's one CoW optimization path for missing hugetlb page
that can trigger hugetlb_wp() inside hugetlb_no_page(), that can bypass the
userfaultfd-wp traps.

A few ways to resolve this:

(1) Skip the CoW optimization for hugetlb private mapping, considering
that private mappings for hugetlb should be very rare, so it may not
really be helpful to major workloads. The worst case is we only skip the
optimization if userfaultfd_wp(vma)==true, because uffd-wp needs another
fault anyway.

(2) Move the userfaultfd-wp handling for hugetlb from hugetlb_fault()
into hugetlb_wp(). The major cons is there're a bunch of locks taken
when calling hugetlb_wp(), and that will make the changeset unnecessarily
complicated due to the lock operations.

(3) Skip hugetlb_wp() when uffd-wp bit is still set. It means it
requires another hugetlb_fault() to resolve the uffd-wp bit first.

This patch chose option (3) which contains the minimum changeset (simplest
for backport) and also make sure hugetlb_wp() itself will start to be
always safe with uffd-wp ptes even if called elsewhere in the future.

This patch will be needed for v5.19+ hence copy stable.

Reported-by: Muhammad Usama Anjum <usama.anjum@xxxxxxxxxxxxx>
Cc: linux-stable <stable@xxxxxxxxxxxxxxx>
Fixes: 166f3ecc0daf ("mm/hugetlb: hook page faults for uffd write protection")
Signed-off-by: Peter Xu <peterx@xxxxxxxxxx>
---
mm/hugetlb.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 8bfd07f4c143..b60959f2a3f0 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5478,7 +5478,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
struct folio *pagecache_folio, spinlock_t *ptl)
{
const bool unshare = flags & FAULT_FLAG_UNSHARE;
- pte_t pte;
+ pte_t pte = huge_ptep_get(ptep);
struct hstate *h = hstate_vma(vma);
struct page *old_page;
struct folio *new_folio;
@@ -5487,6 +5487,17 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long haddr = address & huge_page_mask(h);
struct mmu_notifier_range range;

+ /*
+ * Never handle CoW for uffd-wp protected pages. It should be only
+ * handled when the uffd-wp protection is removed.
+ *
+ * Note that only the CoW optimization path can trigger this and
+ * got skipped, because hugetlb_fault() will always resolve uffd-wp
+ * bit first.
+ */
+ if (huge_pte_uffd_wp(pte))
+ return 0;
+
/*
* hugetlb does not support FOLL_FORCE-style write faults that keep the
* PTE mapped R/O such as maybe_mkwrite() would do.
@@ -5500,7 +5511,6 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
return 0;
}

- pte = huge_ptep_get(ptep);
old_page = pte_page(pte);

delayacct_wpcopy_start();
--
2.39.1


--le5JTZlZZI66lfKs--