Re: [PATCH] mm/damon/vaddr: Safely walk page table

From: David Hildenbrand
Date: Tue Aug 31 2021 - 05:53:19 EST


On 27.08.21 17:04, SeongJae Park wrote:
From: SeongJae Park <sjpark@xxxxxxxxx>

Commit d7f647622761 ("mm/damon: implement primitives for the virtual
memory address spaces") of linux-mm[1] tries to find PTE or PMD for
arbitrary virtual address using 'follow_invalidate_pte()' without proper
locking[2]. This commit fixes the issue by using another page table
walk function for more general use case under proper locking.

[1] https://github.com/hnaz/linux-mm/commit/d7f647622761
[2] https://lore.kernel.org/linux-mm/3b094493-9c1e-6024-bfd5-7eca66399b7e@xxxxxxxxxx

Fixes: d7f647622761 ("mm/damon: implement primitives for the virtual memory address spaces")
Reported-by: David Hildenbrand <david@xxxxxxxxxx>
Signed-off-by: SeongJae Park <sjpark@xxxxxxxxx>
---
mm/damon/vaddr.c | 81 +++++++++++++++++++++++++++++++++++++++++++-----
1 file changed, 74 insertions(+), 7 deletions(-)

diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c
index 230db7413278..b3677f2ef54b 100644
--- a/mm/damon/vaddr.c
+++ b/mm/damon/vaddr.c
@@ -8,10 +8,12 @@
#define pr_fmt(fmt) "damon-va: " fmt
#include <linux/damon.h>
+#include <linux/hugetlb.h>
#include <linux/mm.h>
#include <linux/mmu_notifier.h>
#include <linux/highmem.h>
#include <linux/page_idle.h>
+#include <linux/pagewalk.h>
#include <linux/random.h>
#include <linux/sched/mm.h>
#include <linux/slab.h>
@@ -446,14 +448,69 @@ static void damon_pmdp_mkold(pmd_t *pmd, struct mm_struct *mm,
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
}
+struct damon_walk_private {
+ pmd_t *pmd;
+ pte_t *pte;
+ spinlock_t *ptl;
+};
+
+static int damon_pmd_entry(pmd_t *pmd, unsigned long addr, unsigned long next,
+ struct mm_walk *walk)
+{
+ struct damon_walk_private *priv = walk->private;
+
+ if (pmd_huge(*pmd)) {
+ priv->ptl = pmd_lock(walk->mm, pmd);
+ if (pmd_huge(*pmd)) {
+ priv->pmd = pmd;
+ return 0;
+ }
+ spin_unlock(priv->ptl);
+ }
+
+ if (pmd_none(*pmd) || unlikely(pmd_bad(*pmd)))
+ return -EINVAL;
+ priv->pte = pte_offset_map_lock(walk->mm, pmd, addr, &priv->ptl);
+ if (!pte_present(*priv->pte)) {
+ pte_unmap_unlock(priv->pte, priv->ptl);
+ priv->pte = NULL;
+ return -EINVAL;
+ }
+ return 0;
+}
+
+static struct mm_walk_ops damon_walk_ops = {
+ .pmd_entry = damon_pmd_entry,
+};
+
+int damon_follow_pte_pmd(struct mm_struct *mm, unsigned long addr,
+ struct damon_walk_private *private)
+{
+ int rc;
+
+ private->pte = NULL;
+ private->pmd = NULL;
+ rc = walk_page_range(mm, addr, addr + 1, &damon_walk_ops, private);
+ if (!rc && !private->pte && !private->pmd)
+ return -EINVAL;
+ return rc;
+}
+
static void damon_va_mkold(struct mm_struct *mm, unsigned long addr)
{
- pte_t *pte = NULL;
- pmd_t *pmd = NULL;
+ struct damon_walk_private walk_result;
+ pte_t *pte;
+ pmd_t *pmd;
spinlock_t *ptl;
- if (follow_invalidate_pte(mm, addr, NULL, &pte, &pmd, &ptl))
+ mmap_write_lock(mm);

Can you elaborate why mmap_read_lock() isn't sufficient for your use case? The write mode might heavily affect damon performance and workload impact.


Also, I wonder if it wouldn't be much easier and cleaner to just handle it completely in the .pmd_entry callback, instead of returning PMDs, PTEs, LOCKs, ... here.

You could have

static struct mm_walk_ops damon_mkold_ops = {
.pmd_entry = damon_mkold_pmd_entry,
};

and

static struct mm_walk_ops damon_young_ops = {
.pmd_entry = damon_young_pmd_entry,
};

And then just handle everything completely inside the callback, avoiding having to return locked PTEs, PMDs, .... and instead handling it at a single location. Simply forward the page_sz pointer in the latter case to damon_young_ops.


damon_va_mkold()/damon_va_young() would mostly only call walk_page_range() with the right ops and eventually convert some return values.

--
Thanks,

David / dhildenb