Re: [PATCH v2 05/27] drm/i915/gvt: Verify VFIO-pinned page is THP when shadowing 2M gtt entry

From: Yan Zhao
Date: Fri Mar 17 2023 - 01:58:24 EST


On Fri, Mar 10, 2023 at 04:22:36PM -0800, Sean Christopherson wrote:
> When shadowing a GTT entry with a 2M page, explicitly verify that the
> first page pinned by VFIO is a transparent hugepage instead of assuming
> that page observed by is_2MB_gtt_possible() is the same page pinned by
> vfio_pin_pages(). E.g. if userspace is doing something funky with the
> guest's memslots, or if the page is demoted between is_2MB_gtt_possible()
> and vfio_pin_pages().
>
> This is more of a performance optimization than a bug fix as the check
> for contiguous struct pages should guard against incorrect mapping (even
> though assuming struct pages are virtually contiguous is wrong).
>
> The real motivation for explicitly checking for a transparent hugepage
> after pinning is that it will reduce the risk of introducing a bug in a
> future fix for a page refcount leak (KVMGT doesn't put the reference
> acquired by gfn_to_pfn()), and eventually will allow KVMGT to stop using
> KVM's gfn_to_pfn() altogether.
>
> Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
> ---
> drivers/gpu/drm/i915/gvt/kvmgt.c | 18 ++++++++++++++++--
> 1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
> index 8ae7039b3683..90997cc385b4 100644
> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> @@ -159,11 +159,25 @@ static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
> goto err;
> }
>
> - if (npage == 0)
> - base_page = cur_page;
> + if (npage == 0) {
> + /*
> + * Bail immediately to avoid unnecessary pinning when
> + * trying to shadow a 2M page and the host page isn't
> + * a transparent hugepage.
> + *
> + * TODO: support other type hugepages, e.g. HugeTLB.
> + */
> + if (size == I915_GTT_PAGE_SIZE_2M &&
> + !PageTransHuge(cur_page))
Maybe the checking of PageTransHuge(cur_page) and bailing out is not necessary.
If a page is not transparent huge, but there are 512 contigous 4K
pages, I think it's still good to map them in IOMMU in 2M.
See vfio_pin_map_dma() who does similar things.

> + ret = -EIO;
> + else
> + base_page = cur_page;
> + }
> else if (base_page + npage != cur_page) {
> gvt_vgpu_err("The pages are not continuous\n");
> ret = -EINVAL;
> + }
> + if (ret < 0) {
> npage++;
> goto err;
> }
> --
> 2.40.0.rc1.284.g88254d51c5-goog
>