Re: [PATCH] mm, hugetlb: fix resv_huge_pages underflow on UFFDIO_COPY

From: Mina Almasry
Date: Wed May 12 2021 - 03:45:19 EST


On Tue, May 11, 2021 at 11:58 PM Mina Almasry <almasrymina@xxxxxxxxxx> wrote:
>
> When hugetlb_mcopy_atomic_pte() is called with:
> - mode==MCOPY_ATOMIC_NORMAL and,
> - we already have a page in the page cache corresponding to the
> associated address,
>
> We will allocate a huge page from the reserves, and then fail to insert it
> into the cache and return -EEXIST.
>
> In this case, we need to return -EEXIST without allocating a new page as
> the page already exists in the cache. Allocating the extra page causes
> the resv_huge_pages to underflow temporarily until the extra page is
> freed.
>
> Also, add the warning so that future similar instances of resv_huge_pages
> underflowing will be caught.
>
> Also, minor drive-by cleanups to this code path:
> - pagep is an out param never set by calling code, so delete code
> assuming there could be a valid page in there.
> - use hugetlbfs_pagecache_page() instead of repeating its
> implementation.
>
> Tested using:
> ./tools/testing/selftests/vm/userfaultfd hugetlb_shared 1024 200 \
> /mnt/huge
>
> Test passes, and dmesg shows no underflow warnings.
>
> Signed-off-by: Mina Almasry <almasrymina@xxxxxxxxxx>
> Cc: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
> Cc: Axel Rasmussen <axelrasmussen@xxxxxxxxxx>
> Cc: Peter Xu <peterx@xxxxxxxxxx>
>
> ---
> mm/hugetlb.c | 33 ++++++++++++++++++++-------------
> 1 file changed, 20 insertions(+), 13 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 629aa4c2259c..40f4ad1bca29 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1165,6 +1165,7 @@ static struct page *dequeue_huge_page_vma(struct hstate *h,
> page = dequeue_huge_page_nodemask(h, gfp_mask, nid, nodemask);
> if (page && !avoid_reserve && vma_has_reserves(vma, chg)) {
> SetHPageRestoreReserve(page);
> + WARN_ON_ONCE(!h->resv_huge_pages);
> h->resv_huge_pages--;
> }
>
> @@ -4868,30 +4869,39 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
> struct page **pagep)
> {
> bool is_continue = (mode == MCOPY_ATOMIC_CONTINUE);
> - struct address_space *mapping;
> - pgoff_t idx;
> + struct hstate *h = hstate_vma(dst_vma);
> + struct address_space *mapping = dst_vma->vm_file->f_mapping;
> + pgoff_t idx = vma_hugecache_offset(h, dst_vma, dst_addr);
> unsigned long size;
> int vm_shared = dst_vma->vm_flags & VM_SHARED;
> - struct hstate *h = hstate_vma(dst_vma);
> pte_t _dst_pte;
> spinlock_t *ptl;
> - int ret;
> + int ret = -ENOMEM;
> struct page *page;
> int writable;
>
> - mapping = dst_vma->vm_file->f_mapping;
> - idx = vma_hugecache_offset(h, dst_vma, dst_addr);
> + /* Out parameter. */
> + WARN_ON(*pagep);
>
> if (is_continue) {
> ret = -EFAULT;
> - page = find_lock_page(mapping, idx);
> + page = hugetlbfs_pagecache_page(h, dst_vma, dst_addr);
> if (!page)
> goto out;
> - } else if (!*pagep) {
> - ret = -ENOMEM;
> + } else {
> + /* If a page already exists, then it's UFFDIO_COPY for
> + * a non-missing case. Return -EEXIST.
> + */
> + if (hugetlbfs_pagecache_present(h, dst_vma, dst_addr)) {
> + ret = -EEXIST;
> + goto out;
> + }
> +
> page = alloc_huge_page(dst_vma, dst_addr, 0);
> - if (IS_ERR(page))
> + if (IS_ERR(page)) {
> + ret = -ENOMEM;
> goto out;
> + }
>
> ret = copy_huge_page_from_user(page,
> (const void __user *) src_addr,
> @@ -4904,9 +4914,6 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
> /* don't free the page */
> goto out;
> }
> - } else {
> - page = *pagep;
> - *pagep = NULL;
> }
>
> /*
> --
> 2.31.1.607.g51e8a6a459-goog

I just realized I missed CCing Andrew and the mailing lists to this
patch's review. I'll collect review comments from folks and send a v2
to the correct folks and mailing lists.