Re: [RFC PATCH 1/4] mm: use current as mmu notifier's owner
From: Jason Gunthorpe
Date: Thu Aug 14 2025 - 09:04:32 EST
On Thu, Aug 14, 2025 at 03:53:00PM +0300, Mika Penttilä wrote:
>
> On 8/14/25 15:40, Jason Gunthorpe wrote:
> > On Thu, Aug 14, 2025 at 10:19:26AM +0300, Mika Penttilä wrote:
> >> When doing migration in combination with device fault handling,
> >> detect the case in the interval notifier.
> >>
> >> Without that, we would livelock with our own invalidations
> >> while migrating and splitting pages during fault handling.
> >>
> >> Note, pgmap_owner, used in some other code paths as owner for filtering,
> >> is not readily available for split path, so use current for this use case.
> >> Also, current and pgmap_owner, both being pointers to memory, can not be
> >> mis-interpreted to each other.
> >>
> >> Cc: David Hildenbrand <david@xxxxxxxxxx>
> >> Cc: Jason Gunthorpe <jgg@xxxxxxxxxx>
> >> Cc: Leon Romanovsky <leonro@xxxxxxxxxx>
> >> Cc: Alistair Popple <apopple@xxxxxxxxxx>
> >> Cc: Balbir Singh <balbirs@xxxxxxxxxx>
> >>
> >> Signed-off-by: Mika Penttilä <mpenttil@xxxxxxxxxx>
> >> ---
> >> lib/test_hmm.c | 5 +++++
> >> mm/huge_memory.c | 6 +++---
> >> mm/rmap.c | 4 ++--
> >> 3 files changed, 10 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/lib/test_hmm.c b/lib/test_hmm.c
> >> index 761725bc713c..cd5c139213be 100644
> >> --- a/lib/test_hmm.c
> >> +++ b/lib/test_hmm.c
> >> @@ -269,6 +269,11 @@ static bool dmirror_interval_invalidate(struct mmu_interval_notifier *mni,
> >> range->owner == dmirror->mdevice)
> >> return true;
> >>
> >> + if (range->event == MMU_NOTIFY_CLEAR &&
> >> + range->owner == current) {
> >> + return true;
> >> + }
> > I don't understand this, there is nothing in hmm that says only
> > current can call hmm_range_fault, and indeed most applications won't
> > even gurantee that.
>
> No it's the opposite, if we are ourselves the invalidator, don't care.
I think you've missed the point, you cannot use range->owner in any
way to tell "we are ourselves the invalidator". It is simply not the
meaning of range->owner.
> > So if this plan relies on something like the above in drivers I don't
> > see how it can work.
> >
> > If this is just some hack for tests, try instead to find a solution
> > that more accurately matches what a real driver should do.
> >
> > But this also seems overall troublesome to your goal, if you do a
> > migrate inside hmm_range_fault() it will generate an invalidation call
> > back and that will increment the seqlock and we will loop
> > hmm_range_fault() again which rewalks.
>
> That's the problem this solves.
> The semantics is "if we are the invalidator don't wait for invalidate end",
> aka don't mmu_interval_set_seq() that would make hang in the next mmu_interval_read_begin(),
> waiting the invalidate to end
I doubt we can skip mmu_interval_set_seq(), doing so can corrupt concurrent
hmm_range_fault in some other thread.
Nor, as I said, can we actually skip the invalidation toward HW
anyhow.
At the very least this commit message fails to explain the new locking
proposal, or justify why it can work.
Jason