Re: [RFC PATCH 3/3] kvm: Add VFIO device for handling IOMMU cachecoherency

From: Alexey Kardashevskiy
Date: Sun Sep 15 2013 - 08:41:06 EST


On 09/14/2013 02:25 AM, Alex Williamson wrote:
> On Fri, 2013-09-13 at 18:49 +1000, Alexey Kardashevskiy wrote:
>> On 09/13/2013 07:23 AM, Alex Williamson wrote:
>>> So far we've succeeded at making KVM and VFIO mostly unaware of each
>>> other, but there's any important point where that breaks down. Intel
>>> VT-d hardware may or may not support snoop control. When snoop
>>> control is available, intel-iommu promotes No-Snoop transactions on
>>> PCIe to be cache coherent. That allows KVM to handle things like the
>>> x86 WBINVD opcode as a nop. When the hardware does not support this,
>>> KVM must implement a hardware visible WBINVD for the guest.
>>>
>>> We could simply let userspace tell KVM how to handle WBINVD, but it's
>>> privileged for a reason. Allowing an arbitrary user to enable
>>> physical WBINVD gives them a more access to the hardware. Previously,
>>> this has only been enabled for guests supporting legacy PCI device
>>> assignment. In such cases it's necessary for proper guest execution.
>>> We therefore create a new KVM-VFIO virtual device. The user can add
>>> and remove VFIO groups to this device via file descriptors. KVM
>>> makes use of the VFIO external user interface to validate that the
>>> user has access to physical hardware and gets the coherency state of
>>> the IOMMU from VFIO. This provides equivalent functionality to
>>> legacy KVM assignment, while keeping (nearly) all the bits isolated.
>>>
>>> The one intrusion is the resulting flag indicating the coherency
>>> state. For this RFC it's placed on the x86 kvm_arch struct, however
>>> I know POWER has interest in using the VFIO external user interface,
>>> and I'm hoping we can share a common KVM-VFIO device. Perhaps they
>>> care about No-Snoop handling as well or the code can be #ifdef'd.
>>
>>
>> POWER does not support (at least boos3s - "server", not sure about others)
>> this cache-non-coherent stuff at all.
>
> Then it's easy for your IOMMU API interface to return always cache
> coherent or never cache coherent or whatever ;)
>
>> Regarding reusing this device with external API for POWER - I posted a
>> patch which introduces KVM device to link KVM with IOMMU but besides the
>> list of groups registered in KVM, it also provides the way to find a group
>> by LIOBN (logical bus number) which is used in DMA map/unmap hypercalls. So
>> in my case kvm_vfio_group struct needs LIOBN and it would be nice to have
>> there window_size too (for a quick boundary check). I am not sure we want
>> to mix everything here.
>>
>> It is in "[PATCH v10 12/13] KVM: PPC: Add support for IOMMU in-kernel
>> handling" if you are interested (kvmppc_spapr_tce_iommu_device).
>
> Yes, I stole the code to get the vfio symbols from your code. The
> convergence I was hoping to achieve is that KVM doesn't really want to
> know about VFIO and vica versa. We can therefore at least limit the
> intrusion by sharing a common device. Obviously for you it will need
> some extra interfaces to associate an LIOBN to a group, but we keep both
> the kernel an userspace cleaner by avoiding duplication where we can.
> Is this really not extensible to your usage?

Well, I do not have a good taste for this kind of stuff so I cannot tell
for sure. I can reuse this device and hack it to do whatever I need but how?

There are 2 things I need from KVM device:
1. associate IOMMU group with LIOBN
2. get a pointer to an IOMMU group by LIOBN (used to get ppc's IOMMU table
pointer which is an IOMMU data of an IOMMU group so we could take a
shortcut here).

There are 3 possible interfaces to use:
A. set/get attributes
B. ioctl
C. additional API

What I posted a week ago uses A for 1 and C for 2. Now we move this to
virt/kvm/vfio.c.
A for 1 is more or less ok but how exactly? Yet another attribute? Platform
specific "bus ID"? In your patch attr->addr is not really an address (to be
accessed with get_user()) but just an fd.

For 2 - there are already A and B interfaces so we do not want C, right?
What kind of a good device has a backdoor? :) But A and B do not have
access control to prevent the user space from receiving a IOMMU group
pointer (which it would not be able to use anyway but still). Do we care
about this (I do not)? And using B (ioctls) within the kernel - I cannot
say I saw many usages of this.

I am pretty sure we will spend some time (not much) arguing about these
things and when we agree on something, then some of KVM maintainers will
step in and say that there is 1001st way of doing this and - goto start.
And after all, this still won't be a device as it does not emulate anything
present in the real hardware, this is just yet another interface to KVM and
some ways of using it won't be natural for somebody. /me sighs.



--
Alexey
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/