Re: [PATCH 4/5] mm/hmm: hmm_vma_fault() doesn't always call hmm_range_unregister()

From: Ralph Campbell
Date: Thu Jun 06 2019 - 17:12:28 EST



On 6/6/19 12:54 PM, Jason Gunthorpe wrote:
On Thu, Jun 06, 2019 at 12:44:36PM -0700, Ralph Campbell wrote:

On 6/6/19 7:50 AM, Jason Gunthorpe wrote:
On Mon, May 06, 2019 at 04:29:41PM -0700, rcampbell@xxxxxxxxxx wrote:
From: Ralph Campbell <rcampbell@xxxxxxxxxx>

The helper function hmm_vma_fault() calls hmm_range_register() but is
missing a call to hmm_range_unregister() in one of the error paths.
This leads to a reference count leak and ultimately a memory leak on
struct hmm.

Always call hmm_range_unregister() if hmm_range_register() succeeded.

Signed-off-by: Ralph Campbell <rcampbell@xxxxxxxxxx>
Signed-off-by: JÃrÃme Glisse <jglisse@xxxxxxxxxx>
Cc: John Hubbard <jhubbard@xxxxxxxxxx>
Cc: Ira Weiny <ira.weiny@xxxxxxxxx>
Cc: Dan Williams <dan.j.williams@xxxxxxxxx>
Cc: Arnd Bergmann <arnd@xxxxxxxx>
Cc: Balbir Singh <bsingharora@xxxxxxxxx>
Cc: Dan Carpenter <dan.carpenter@xxxxxxxxxx>
Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx>
Cc: Souptick Joarder <jrdr.linux@xxxxxxxxx>
Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
include/linux/hmm.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index 35a429621e1e..fa0671d67269 100644
+++ b/include/linux/hmm.h
@@ -559,6 +559,7 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block)
return (int)ret;
if (!hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT)) {
+ hmm_range_unregister(range);
/*
* The mmap_sem was taken by driver we release it here and
* returns -EAGAIN which correspond to mmap_sem have been
@@ -570,13 +571,13 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block)
ret = hmm_range_fault(range, block);
if (ret <= 0) {
+ hmm_range_unregister(range);

While this seems to be a clear improvement, it seems there is still a
bug in nouveau_svm.c around here as I see it calls hmm_vma_fault() but
never calls hmm_range_unregister() for its on stack range - and
hmm_vma_fault() still returns with the range registered.

As hmm_vma_fault() is only used by nouveau and is marked as
deprecated, I think we need to fix nouveau, either by dropping
hmm_range_fault(), or by adding the missing unregister to nouveau in
this patch.

I will send a patch for nouveau to use hmm_range_register() and
hmm_range_fault() and do some testing with OpenCL.

wow, thanks, I'd like to also really like to send such a thing through
hmm.git - do you know who the nouveau maintainers are so we can
collaborate on patch planning this?

Ben Skeggs <bskeggs@xxxxxxxxxx> is the maintainer and
nouveau@xxxxxxxxxxxxxxxxxxxxx is the mailing list for changes.
I'll be sure to CC them for the patch.

I can also send a separate patch to then remove hmm_vma_fault()
but I guess that should be after AMD's changes.

Let us wait to hear back from AMD how they can consume hmm.git - I'd
very much like to get everything done in one kernel cycle!

Regards,
Jason