Re: [PATCH 2/2] vmsplice: Add limited zero copy to vmsplice

From: Dave Hansen
Date: Tue Oct 08 2013 - 12:46:08 EST


On 10/07/2013 01:21 PM, Robert C Jennings wrote:
> + if (!buf->offset && (buf->len == PAGE_SIZE) &&
> + (buf->flags & PIPE_BUF_FLAG_GIFT) && (sd->flags & SPLICE_F_MOVE)) {
> + struct page *page = buf->page;
> + struct mm_struct *mm;
> + struct vm_area_struct *vma;
> + spinlock_t *ptl;
> + pte_t *ptep, pte;
> + unsigned long useraddr;
> +
> + if (!PageAnon(page))
> + goto copy;
> + if (PageCompound(page))
> + goto copy;
> + if (PageHuge(page) || PageTransHuge(page))
> + goto copy;
> + if (page_mapped(page))
> + goto copy;

I'd really like to see some comments about those cases. You touched on
page_mapped() above, but could you replicate some of that in a comment?

Also, considering that this is being targeted at QEMU VMs, I would
imagine that you're going to want to support PageTransHuge() in here
pretty fast. Do you anticipate that being very much trouble? Have you
planned for it in here?

> + useraddr = (unsigned long)sd->u.userptr;
> + mm = current->mm;
> +
> + ret = -EAGAIN;
> + down_read(&mm->mmap_sem);
> + vma = find_vma_intersection(mm, useraddr, useraddr + PAGE_SIZE);

If oyu are only doing these a page at a time, why bother with
find_vma_intersection()? Why not a plain find_vma()?

Also, if we fail to find a VMA, won't this return -EAGAIN? That seems
like a rather uninformative error code to get returned back out to
userspace, especially since retrying won't help.

> + if (IS_ERR_OR_NULL(vma))
> + goto up_copy;
> + if (!vma->anon_vma) {
> + ret = anon_vma_prepare(vma);
> + if (ret)
> + goto up_copy;
> + }

The first thing anon_vma_prepare() does is check vma->anon_vma. This
extra check seems unnecessary.

> + zap_page_range(vma, useraddr, PAGE_SIZE, NULL);
> + ret = lock_page_killable(page);
> + if (ret)
> + goto up_copy;
> + ptep = get_locked_pte(mm, useraddr, &ptl);
> + if (!ptep)
> + goto unlock_up_copy;
> + pte = *ptep;
> + if (pte_present(pte))
> + goto unlock_up_copy;
> + get_page(page);
> + page_add_anon_rmap(page, vma, useraddr);
> + pte = mk_pte(page, vma->vm_page_prot);

'pte' is getting used for two different things here, which makes it a
bit confusing. I'd probably just skip this first assignment and
directly do:

if (pte_present(*ptep))
goto unlock_up_copy;

> + set_pte_at(mm, useraddr, ptep, pte);
> + update_mmu_cache(vma, useraddr, ptep);
> + pte_unmap_unlock(ptep, ptl);
> + ret = 0;
> +unlock_up_copy:
> + unlock_page(page);
> +up_copy:
> + up_read(&mm->mmap_sem);
> + if (!ret) {
> + ret = sd->len;
> + goto out;
> + }
> + /* else ret < 0 and we should fallback to copying */
> + VM_BUG_ON(ret > 0);
> + }

This also screams to be broken out in to a helper function instead of
just being thrown in with the existing code.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/