Re: [RFC PATCH 1/4] mm/mempolicy: Expose policy_nodemask() in include/linux/mempolicy.h
From: Bijan Tabatabai
Date: Mon Jun 16 2025 - 10:22:15 EST
On Mon, Jun 16, 2025 at 4:46 AM David Hildenbrand <david@xxxxxxxxxx> wrote:
>
> On 13.06.25 18:33, Bijan Tabatabai wrote:
> > On Fri, Jun 13, 2025 at 8:45 AM David Hildenbrand <david@xxxxxxxxxx> wrote:
> >>
> >> On 12.06.25 20:13, Bijan Tabatabai wrote:
[...]
> Hi,
>
> >
> > I did not use get_vma_policy or mpol_misplaced, which I believe is the
> > closest function that exists for what I want in this patch, because
> > those functions
>
> I think what you mean is, that you are performing an rmap walk. But
> there, you do have a VMA + MM available (stable).
>
> > seem to assume they are called inside of the task that the folio/vma
> > is mapped to.
>
> But, we do have a VMA at hand, so why would we want to ignore any set
> policy? (I think VMA policies so far only apply to shmem, but still).
>
> I really think you want to use get_vma_policy() instead of the task policy.
Sorry, I think I misunderstood you before. You are right, we should
consider the VMA policy before using the task policy. I will do this
in the next revision.
>
> > More specifically, mpol_misplaced assumes it is being called within a
> > page fault.
> > This doesn't work for us, because we call it inside of a kdamond process.
>
> Right.
>
> But it uses the vmf only for ...
>
> 1) Obtaining the VMA
> 2) Sanity-checking that the ptlock is held.
>
> Which, you also have during the rmap walk.
There is another subtle dependency in get_vma_policy.
It first checks if a VMA policy exists, and if it doesn't, it uses the
task policy of the current task, which doesn't make sense when called
by a kdamond thread.
However, I don't think this will change what seems to be our consensus
of adding a new helper function.
>
> So what about factoring out that handling from mpol_misplaced(), having
> another function where you pass the VMA instead of the vmf?
>
> >
> > I would be open to adding a new function that takes in a folio, vma,
> > address, and
> > task_struct and returns the nid the folio should be placed on. It could possibly
> > be implemented as a function internal to mpol_misplaced because the two would
> > be very similar.
>
> Good, you had the same thought :)
>
> >
> > How would you propose we handle MPOL_BIND and MPOL_PREFFERED_MANY
> > in this function? mpol_misplaced chooses a nid based on the node and
> > cpu the fault
> > occurred on, which we wouldn't have in a kdamond context. The two options I see
> > are either:
> > 1. return the nid of the first node in the policy's nodemask
> > 2. return NUMA_NO_NODE
> > I think I would lean towards the first.
>
> I guess we'd need a way for your new helper to deal with both cases
> (is_fault vs. !is_fault), and make a decision based on that.
>
>
> For your use case, you can then decide what would be appropriate. It's a
> good question what the appropriate action would be: 1) sounds better,
> but I do wonder if we would rather want to distribute the folios in a
> different way across applicable nodes, not sure ...
Yes, I was thinking about that too, but I felt that adding state
somewhere or using randomness to distribute the folios was incorrect,
especially since those policies are not the focus of this patchset.
I think I'll move forward with option 1 for now.
Thanks,
Bijan