Re: [PATCH] userfaultfd: hugetlbfs: add UFFDIO_COPY support for shared mappings

From: Andrea Arcangeli
Date: Fri Feb 17 2017 - 16:35:12 EST


On Fri, Feb 17, 2017 at 01:08:55PM -0800, Andrew Morton wrote:
> I had a bunch more rejects to fix in that function. Below is the final
> result - please check it carefully.

Sure, reviewed and this is the diff that remains (vm_shared assignment
location is irrelevant, I put it at the end as it's only needed later
and not checked in the out_unlock path, err = -EINVAL also is fine to
stay):

diff --git a/tmp/x b/mm/userfaultfd.c
index a3ba029..3ec9aad 100644
--- a/tmp/x
+++ b/mm/userfaultfd.c
@@ -63,22 +212,17 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
dst_start + len > dst_vma->vm_end)
goto out_unlock;

- vm_shared = dst_vma->vm_flags & VM_SHARED;
-
err = -EINVAL;
if (vma_hpagesize != vma_kernel_pagesize(dst_vma))
goto out_unlock;
+
+ vm_shared = dst_vma->vm_flags & VM_SHARED;
}

- err = -EINVAL;
if (WARN_ON(dst_addr & (vma_hpagesize - 1) ||
(len - copied) & (vma_hpagesize - 1)))
goto out_unlock;

- if (dst_start < dst_vma->vm_start ||
- dst_start + len > dst_vma->vm_end)
- goto out_unlock;
-
/*
* If not shared, ensure the dst_vma has a anon_vma.
*/


In short there's only the last 4 lines of the above that can be
applied.

__mcopy_atomic_hugetlb in the first pass (i.e. dst_vma not NULL) is
invoked after those checks already have been run in the caller.

if (dst_start < dst_vma->vm_start ||
dst_start + len > dst_vma->vm_end)
goto out_unlock;

err = -EINVAL;
/*
* shmem_zero_setup is invoked in mmap for MAP_ANONYMOUS|MAP_SHARED but
* it will overwrite vm_ops, so vma_is_anonymous must return false.
*/
if (WARN_ON_ONCE(vma_is_anonymous(dst_vma) &&
dst_vma->vm_flags & VM_SHARED))
goto out_unlock;

/*
* If this is a HUGETLB vma, pass off to appropriate routine
*/
if (is_vm_hugetlb_page(dst_vma))
return __mcopy_atomic_hugetlb(dst_mm, dst_vma, dst_start,
src_start, len, zeropage);

As usual hugetlbfs takes its own tangent out of the main VM code after
various checks have already been done that applies to hugetlbfs too.

In the "retry" case the dst_vma is set to NULL and the dst_vma is
being searched again and revalidated, and we so we repeat the
check. First time it's not needed, for second time it would be a
repetition and so it's a noop.