Re: [RFC PATCH 1/4] mm/mempolicy: Expose policy_nodemask() in include/linux/mempolicy.h

From: David Hildenbrand
Date: Mon Jun 16 2025 - 05:47:29 EST


On 13.06.25 18:33, Bijan Tabatabai wrote:
On Fri, Jun 13, 2025 at 8:45 AM David Hildenbrand <david@xxxxxxxxxx> wrote:

On 12.06.25 20:13, Bijan Tabatabai wrote:
From: Bijan Tabatabai <bijantabatab@xxxxxxxxxx>

This patch is to allow DAMON to call policy_nodemask() so it can
determine where to place a page for interleaving.

Signed-off-by: Bijan Tabatabai <bijantabatab@xxxxxxxxxx>
---
include/linux/mempolicy.h | 9 +++++++++
mm/mempolicy.c | 4 +---
2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h
index 0fe96f3ab3ef..e96bf493ff7a 100644
--- a/include/linux/mempolicy.h
+++ b/include/linux/mempolicy.h
@@ -133,6 +133,8 @@ struct mempolicy *__get_vma_policy(struct vm_area_struct *vma,
struct mempolicy *get_vma_policy(struct vm_area_struct *vma,
unsigned long addr, int order, pgoff_t *ilx);
bool vma_policy_mof(struct vm_area_struct *vma);
+nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *pol,
+ pgoff_t ilx, int *nid);

extern void numa_default_policy(void);
extern void numa_policy_init(void);
@@ -232,6 +234,13 @@ static inline struct mempolicy *get_vma_policy(struct vm_area_struct *vma,
return NULL;
}

+static inline nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *pol,
+ pgoff_t ilx, int *nid)
+{
+ *nid = NUMA_NO_NODE;
+ return NULL;
+}
+
static inline int
vma_dup_policy(struct vm_area_struct *src, struct vm_area_struct *dst)
{
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 3b1dfd08338b..54f539497e20 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -596,8 +596,6 @@ static const struct mempolicy_operations mpol_ops[MPOL_MAX] = {

static bool migrate_folio_add(struct folio *folio, struct list_head *foliolist,
unsigned long flags);
-static nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *pol,
- pgoff_t ilx, int *nid);

static bool strictly_unmovable(unsigned long flags)
{
@@ -2195,7 +2193,7 @@ static unsigned int interleave_nid(struct mempolicy *pol, pgoff_t ilx)
* Return a nodemask representing a mempolicy for filtering nodes for
* page allocation, together with preferred node id (or the input node id).
*/
-static nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *pol,
+nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *pol,
pgoff_t ilx, int *nid)
{
nodemask_t *nodemask = NULL;

You actually only care about the nid for your use case.

Maybe we should add

get_vma_policy_node() that internally does a get_vma_policy() to then
give you only the node back.

If get_vma_policy() is not the right thing (see my reply to patch #2),
of course a get_task_policy_node() could be added.

--
Cheers,

David / dhildenb

Hi David,

Hi,


I did not use get_vma_policy or mpol_misplaced, which I believe is the
closest function that exists for what I want in this patch, because
those functions

I think what you mean is, that you are performing an rmap walk. But there, you do have a VMA + MM available (stable).

seem to assume they are called inside of the task that the folio/vma
is mapped to.

But, we do have a VMA at hand, so why would we want to ignore any set policy? (I think VMA policies so far only apply to shmem, but still).

I really think you want to use get_vma_policy() instead of the task policy.


More specifically, mpol_misplaced assumes it is being called within a
page fault.
This doesn't work for us, because we call it inside of a kdamond process.

Right.

But it uses the vmf only for ...

1) Obtaining the VMA
2) Sanity-checking that the ptlock is held.

Which, you also have during the rmap walk.


So what about factoring out that handling from mpol_misplaced(), having another function where you pass the VMA instead of the vmf?


I would be open to adding a new function that takes in a folio, vma,
address, and
task_struct and returns the nid the folio should be placed on. It could possibly
be implemented as a function internal to mpol_misplaced because the two would
be very similar.

Good, you had the same thought :)


How would you propose we handle MPOL_BIND and MPOL_PREFFERED_MANY
in this function? mpol_misplaced chooses a nid based on the node and
cpu the fault
occurred on, which we wouldn't have in a kdamond context. The two options I see
are either:
1. return the nid of the first node in the policy's nodemask
2. return NUMA_NO_NODE
I think I would lean towards the first.

I guess we'd need a way for your new helper to deal with both cases (is_fault vs. !is_fault), and make a decision based on that.


For your use case, you can then decide what would be appropriate. It's a good question what the appropriate action would be: 1) sounds better, but I do wonder if we would rather want to distribute the folios in a different way across applicable nodes, not sure ...

--
Cheers,

David / dhildenb