On Sat, Jul 19, 2025 at 07:16:48PM +0530, Dev Jain wrote:
On 19/07/25 12:19 am, Lorenzo Stoakes wrote:You're welcome :)
On Fri, Jul 18, 2025 at 02:32:43PM +0530, Dev Jain wrote:Thanks!
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx>
Haha, well disagreement is permitted you know ;) as long as it's polite ofI politely disagree :) start_idx + max_len is *obviously* the---Nit but:
mm/mprotect.c | 125 +++++++++++++++++++++++++++++++++++++++++++++-----
1 file changed, 113 insertions(+), 12 deletions(-)
diff --git a/mm/mprotect.c b/mm/mprotect.c
index a1c7d8a4648d..2ddd37b2f462 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -106,7 +106,7 @@ bool can_change_pte_writable(struct vm_area_struct *vma, unsigned long addr,
}
static int mprotect_folio_pte_batch(struct folio *folio, pte_t *ptep,
- pte_t pte, int max_nr_ptes)
+ pte_t pte, int max_nr_ptes, fpb_t flags)
{
/* No underlying folio, so cannot batch */
if (!folio)
@@ -115,7 +115,7 @@ static int mprotect_folio_pte_batch(struct folio *folio, pte_t *ptep,
if (!folio_test_large(folio))
return 1;
- return folio_pte_batch(folio, ptep, pte, max_nr_ptes);
+ return folio_pte_batch_flags(folio, NULL, ptep, &pte, max_nr_ptes, flags);
}
static bool prot_numa_skip(struct vm_area_struct *vma, unsigned long addr,
@@ -177,6 +177,102 @@ static bool prot_numa_skip(struct vm_area_struct *vma, unsigned long addr,
return ret;
}
+/* Set nr_ptes number of ptes, starting from idx */
+static void prot_commit_flush_ptes(struct vm_area_struct *vma, unsigned long addr,
+ pte_t *ptep, pte_t oldpte, pte_t ptent, int nr_ptes,
+ int idx, bool set_write, struct mmu_gather *tlb)
+{
+ /*
+ * Advance the position in the batch by idx; note that if idx > 0,
+ * then the nr_ptes passed here is <= batch size - idx.
+ */
+ addr += idx * PAGE_SIZE;
+ ptep += idx;
+ oldpte = pte_advance_pfn(oldpte, idx);
+ ptent = pte_advance_pfn(ptent, idx);
+
+ if (set_write)
+ ptent = pte_mkwrite(ptent, vma);
+
+ modify_prot_commit_ptes(vma, addr, ptep, oldpte, ptent, nr_ptes);
+ if (pte_needs_flush(oldpte, ptent))
+ tlb_flush_pte_range(tlb, addr, nr_ptes * PAGE_SIZE);
+}
+
+/*
+ * Get max length of consecutive ptes pointing to PageAnonExclusive() pages or
+ * !PageAnonExclusive() pages, starting from start_idx. Caller must enforce
+ * that the ptes point to consecutive pages of the same anon large folio.
+ */
+static int page_anon_exclusive_sub_batch(int start_idx, int max_len,
+ struct page *first_page, bool expected_anon_exclusive)
+{
+ int idx;
int end = start_idx + max_len;
for (idx = start_idx + 1; idx < end; idx++) {
Would be a little neater here.
end index, no need to add one more line of code asserting that.
course...
That's fine, this isn't a big deal.
OK, I don't think this is an issue, but I"m not going to press this, as it's a
I don't think so. first_page[idx] may confuse us into thinking that+Nitty again but the below might be a little clearer?
+ for (idx = start_idx + 1; idx < start_idx + max_len; ++idx) {
struct page *page = &firstpage[idx];
if (expected_anon_exclusive != PageAnonExclusive(page))
we have an array of pages. Also, the way you define it assigns a
stack address to struct page *page; this is not a problem in theory
and the code will still be correct, but I will prefer struct page *page
containing the actual address of the linear map struct page, which is
vmemmap + PFN. The way I write it is, I initialize first_page from folio_page()
which will derive the address from folio->page, and folio was derived from
vm_normal_folio() (which was derived from the PFN in the PTE), therefore
first_page will contain the actual vmemmap address of corresponding struct page,
hence it is guaranteed that first_page + x will give me the x'th page in
the folio.
trivial thing, it's fine as-is :)
Well, you don't need to :) maybe rename i to pte_idx + pass nr_ptes - pte_idx.
We won't be able to do nr_ptes -= len with this. And personally a while loop
+ if (expected_anon_exclusive != PageAnonExclusive(first_page + idx))Nice comment thanks.
+ break;
+ }
+ return idx - start_idx;
+}
+
+/*
+ * This function is a result of trying our very best to retain the
+ * "avoid the write-fault handler" optimization. In can_change_pte_writable(),
+ * if the vma is a private vma, and we cannot determine whether to change
+ * the pte to writable just from the vma and the pte, we then need to look
+ * at the actual page pointed to by the pte. Unfortunately, if we have a
+ * batch of ptes pointing to consecutive pages of the same anon large folio,
+ * the anon-exclusivity (or the negation) of the first page does not guarantee
+ * the anon-exclusivity (or the negation) of the other pages corresponding to
+ * the pte batch; hence in this case it is incorrect to decide to change or
+ * not change the ptes to writable just by using information from the first
+ * pte of the batch. Therefore, we must individually check all pages and
+ * retrieve sub-batches.
+ */
+static void commit_anon_folio_batch(struct vm_area_struct *vma,I'd prefer this to be:
+ struct folio *folio, unsigned long addr, pte_t *ptep,
+ pte_t oldpte, pte_t ptent, int nr_ptes, struct mmu_gather *tlb)
+{
+ struct page *first_page = folio_page(folio, 0);
+ bool expected_anon_exclusive;
+ int sub_batch_idx = 0;
+ int len;
+
+ while (nr_ptes) {
int i;
...
for (i = 0; i < nr_ptes; i += len, sub_batch_idx += len) {
+ expected_anon_exclusive = PageAnonExclusive(first_page + sub_batch_idx);
is clearer to me here.
Buuuut I'm not going to press this, it's not a big deal, and I see your point!
Overall the R-b tag still stands with the above unchanged.
Thanks for doing this series and being open to feedback, I feel we're iterated
to something nice here!
Cheers, Lorenzo