Re: [RFC PATCH 0/2] mm/gup: fix gup_fast with dynamic page table folding

From: John Hubbard
Date: Tue Sep 01 2020 - 19:22:29 EST


On 9/1/20 10:40 AM, Gerald Schaefer wrote:
On Mon, 31 Aug 2020 12:15:53 -0700
Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
...
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index e8cbc2e795d5..43dacbce823f 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -681,6 +681,38 @@ static inline int arch_unmap_one(struct mm_struct *mm,
})
#endif
+/*
+ * With dynamic page table levels on s390, the static pXd_addr_end() functions
+ * will not return corresponding dynamic boundaries. This is no problem as long
+ * as only pXd pointers are passed down during page table walk, because
+ * pXd_offset() will simply return the given pointer for folded levels, and the
+ * pointer iteration over a range simply happens at the correct page table
+ * level.
+ * It is however a problem with gup_fast, or other places walking the page
+ * tables w/o locks using READ_ONCE(), and passing down the pXd values instead
+ * of pointers. In this case, the pointer given to pXd_offset() is a pointer to
+ * a stack variable, which cannot be used for pointer iteration at the correct
+ * level. Instead, the iteration then has to happen by going up to pgd level
+ * again. To allow this, provide pXd_addr_end_folded() functions with an
+ * additional pXd value parameter, which can be used on s390 to determine the
+ * folding level and return the corresponding boundary.

Ah OK, I finally see what you have in mind. And as Jason noted, if we just
pass an additional parameter to pXd_addr_end() that's going to be
cleaner. And doing so puts this in line with other page table
abstractions that also carry more information than some architectures
need. For example, on x86, set_pte_at() ignores the first two
parameters:

#define set_pte_at(mm, addr, ptep, pte) native_set_pte_at(mm, addr, ptep, pte)

static inline void native_set_pte_at(struct mm_struct *mm, unsigned long addr,
pte_t *ptep , pte_t pte)
{
native_set_pte(ptep, pte);
}

This type of abstraction has worked out very well, IMHO.


thanks,
--
John Hubbard
NVIDIA