Re: [RFC PATCH 3/3] kvm: Add VFIO device for handling IOMMU cachecoherency

From: Alex Williamson
Date: Fri Sep 13 2013 - 12:25:53 EST


On Fri, 2013-09-13 at 18:49 +1000, Alexey Kardashevskiy wrote:
> On 09/13/2013 07:23 AM, Alex Williamson wrote:
> > So far we've succeeded at making KVM and VFIO mostly unaware of each
> > other, but there's any important point where that breaks down. Intel
> > VT-d hardware may or may not support snoop control. When snoop
> > control is available, intel-iommu promotes No-Snoop transactions on
> > PCIe to be cache coherent. That allows KVM to handle things like the
> > x86 WBINVD opcode as a nop. When the hardware does not support this,
> > KVM must implement a hardware visible WBINVD for the guest.
> >
> > We could simply let userspace tell KVM how to handle WBINVD, but it's
> > privileged for a reason. Allowing an arbitrary user to enable
> > physical WBINVD gives them a more access to the hardware. Previously,
> > this has only been enabled for guests supporting legacy PCI device
> > assignment. In such cases it's necessary for proper guest execution.
> > We therefore create a new KVM-VFIO virtual device. The user can add
> > and remove VFIO groups to this device via file descriptors. KVM
> > makes use of the VFIO external user interface to validate that the
> > user has access to physical hardware and gets the coherency state of
> > the IOMMU from VFIO. This provides equivalent functionality to
> > legacy KVM assignment, while keeping (nearly) all the bits isolated.
> >
> > The one intrusion is the resulting flag indicating the coherency
> > state. For this RFC it's placed on the x86 kvm_arch struct, however
> > I know POWER has interest in using the VFIO external user interface,
> > and I'm hoping we can share a common KVM-VFIO device. Perhaps they
> > care about No-Snoop handling as well or the code can be #ifdef'd.
>
>
> POWER does not support (at least boos3s - "server", not sure about others)
> this cache-non-coherent stuff at all.

Then it's easy for your IOMMU API interface to return always cache
coherent or never cache coherent or whatever ;)

> Regarding reusing this device with external API for POWER - I posted a
> patch which introduces KVM device to link KVM with IOMMU but besides the
> list of groups registered in KVM, it also provides the way to find a group
> by LIOBN (logical bus number) which is used in DMA map/unmap hypercalls. So
> in my case kvm_vfio_group struct needs LIOBN and it would be nice to have
> there window_size too (for a quick boundary check). I am not sure we want
> to mix everything here.
>
> It is in "[PATCH v10 12/13] KVM: PPC: Add support for IOMMU in-kernel
> handling" if you are interested (kvmppc_spapr_tce_iommu_device).

Yes, I stole the code to get the vfio symbols from your code. The
convergence I was hoping to achieve is that KVM doesn't really want to
know about VFIO and vica versa. We can therefore at least limit the
intrusion by sharing a common device. Obviously for you it will need
some extra interfaces to associate an LIOBN to a group, but we keep both
the kernel an userspace cleaner by avoiding duplication where we can.
Is this really not extensible to your usage? Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/