Re: [PATCH v2] kvm/arm64: fixed passthrough gpu into vm on arm64

From: Jason Gunthorpe
Date: Fri Apr 01 2022 - 10:19:59 EST


On Fri, Apr 01, 2022 at 05:08:28PM +0800, xieming wrote:
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 51b791c750f1..6f66efb71743 100644
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -1452,7 +1452,14 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma)
> }
>
> vma->vm_private_data = vdev;
> +#ifdef CONFIG_ARM64
> + if (vfio_pci_is_vga(pdev))
> + vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
> + else
> + vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
> +#else
> vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
> +#endif

This is a user visible change if VFIO starts making things write
combining then userspace has to have different barriers around MMIO.

Also this problem is bigger than just GPUs, lots of devices use write
combining memory for their BARs and will do so inside VMs as well - so
testing for 'pci_is_vga' is also not right.

I think you need to solve this by having userspace somehow request the
cachability type for the mmap (though I'm not sure how KVM will know
what to do with that), or by having kvm always force write combining
for all ioremaps..

> +/**
> + * is_vma_write_combine - check if VMA is mapped with writecombine or not
> + * Return true if VMA mapped with MT_NORMAL_NC otherwise fasle
> + */
> +static inline bool is_vma_write_combine(struct vm_area_struct *vma)
> +{
> + pteval_t pteval = pgprot_val(vma->vm_page_prot);
> +
> + return ((pteval & PTE_ATTRINDX_MASK) == PTE_ATTRINDX(MT_NORMAL_NC));
> +}

Shouldn't KVM be copying the exact pgprot bits from the VMA to the
KVM PTEs when it is mirroring them? eg the difference between
pgprot_device and pgprot_noncached() seems relevant to preserve as well.

> @@ -1209,7 +1221,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
> pfn = __phys_to_pfn(pa);
>
> for (addr = guest_ipa; addr < end; addr += PAGE_SIZE) {
> - pte_t pte = pfn_pte(pfn, PAGE_S2_DEVICE);
> + pte_t pte = pfn_pte(pfn, writecombine ? PAGE_S2_NC : PAGE_S2_DEVICE);

Please don't send patches to the mailing list that are against such
old kernels, this code was deleted in 2020.

Jason