Re: [RFC 0/4] Virtio uses DMA API for all devices

From: Michael S. Tsirkin
Date: Wed Aug 08 2018 - 16:32:03 EST


On Wed, Aug 08, 2018 at 11:18:13PM +1000, Benjamin Herrenschmidt wrote:
> Sure, but all of this is just the configuration of the iommu. But I
> think we agree here, and your point remains valid, indeed my proposed
> hack:
>
> > if ((flags & VIRTIO_F_IOMMU_PLATFORM) || arch_virtio_wants_dma_ops())
>
> Will only work if the IOMMU and non-IOMMU path are completely equivalent.
>
> We can provide that guarantee for our secure VM case, but not generally so if
> we were to go down the route of a quirk in virtio, it might be better to
> make it painfully obvious that it's specific to that one case with a different
> kind of turd:
>
> - if (xen_domain())
> + if (xen_domain() || pseries_secure_vm())
> return true;

I don't think it's pseries specific actually. E.g. I suspect AMD SEV
might benefit from the same kind of hack.


> So to summarize, and make sure I'm not missing something, the two approaches
> at hand are either:
>
> 1- The above, which is a one liner and contained in the guest, so that's nice, but
> also means another turd in virtio which isn't ...
>
> 2- We force pseries to always set VIRTIO_F_IOMMU_PLATFORM, but with the current
> architecture on our side that will force virtio to always go through an emulated
> iommu, as pseries doesn't have the concept of a real bypass window, and thus will
> impact performance for both secure and non-secure VMs.
>
> 3- Invent a property that can be put in selected PCI device tree nodes that
> indicates that for that device specifically, the iommu can be bypassed, along with
> a hypercall to turn that bypass on/off. Virtio would then use VIRTIO_F_IOMMU_PLATFORM
> but its DT nodes would also have that property and Linux would notice it and turn
> bypass on.

For completeness, virtio could also have its own bounce buffer
outside of DMA API one. I don't see lots of benefits to this
though.


> The resulting properties of those options are:
>
> 1- Is what I want because it's the simplest, provides the best performance now,
> and works without code changes to qemu or non-secure Linux. However it does
> add a tiny turd to virtio which is annoying.
>
> 2- This works but it puts the iommu in the way always, thus reducing virtio performance
> accross the board for pseries unless we only do that for secure VMs but that is
> difficult (as discussed earlier).
>
> 3- This would recover the performance lost in -2-, however it requires qemu *and*
> guest changes. Specifically, existing guests (RHEL 7 etc...) would get the
> performance hit of -2- unless modified to call that 'enable bypass' call, which
> isn't great.
>
> So imho we have to chose one of 3 not-great solutions here... Unless I missed
> something in your ideas of course.
>
> Cheers,
> Ben.
>
>