Re: [RFC PATCH 1/3] content: Add VIRTIO_F_SWIOTLB to negotiate use of SWIOTLB bounce buffers

From: Michael S. Tsirkin
Date: Fri Apr 04 2025 - 06:37:59 EST


On Fri, Apr 04, 2025 at 11:15:33AM +0100, David Woodhouse wrote:
> On Fri, 2025-04-04 at 10:27 +0100, David Woodhouse wrote:
> > On 4 April 2025 09:32:39 BST, "Michael S. Tsirkin" <mst@xxxxxxxxxx>
> > wrote:
> > > On Fri, Apr 04, 2025 at 09:16:44AM +0100, David Woodhouse wrote:
> > > > On Fri, 2025-04-04 at 04:09 -0400, Michael S. Tsirkin wrote:
> > > > > On Fri, Apr 04, 2025 at 08:50:47AM +0100, David Woodhouse
> > > > > wrote:
> > > > > > What's annoying is that this should work out of the box
> > > > > > *already* with
> > > > > > virtio-mmio and a `restricted-dma-pool` — for systems which
> > > > > > aren't
> > > > > > afflicted by UEFI/ACPI/PCI as their discovery mechanisms.
> > > > >
> > > > >
> > > > > That specifically would be just a driver bugfix then?
> > > >
> > > > I actually think it works out of the box and there isn't even a
> > > > bug to
> > > > fix. Haven't tested yet.
> > > >
> > > > The sad part is that the system does it all automatically *if* it
> > > > has
> > > > CONFIG_DMA_RESTRICTED_POOL (e.g. Linux) and the driver never even
> > > > notices that the dma_ops it's using are the swiotlb ops using the
> > > > provided buffer.
> > > >
> > > > Which is *kind* of nice... except that when on a guest OS which
> > > > *isn't*
> > > > Linux with CONFIG_DMA_RESTRICTED_POOL, the guest will just ignore
> > > > the
> > > > `restricted-dma-pool` node and try DMA to system memory anyway,
> > > > which
> > > > will fail.
> > >
> > > I mean, it's easy to misconfigure Linux, this is why we love it ;)
> > > Why
> > > is this such a concern?
> >
> > Because it's incompatible. In the DT world, perhaps this new *non-
> > optional* feature/restriction should have come with a new
> > "compatible" string such as "virtio-mmio-restricted-dma".
> >
> > Adding it without backwards compatibility wasn't ideal.
> >
> > > > That's why my proposal adds the negotiated VIRTIO_F_SWIOTLB
> > > > feature, so
> > > > that the device side can refuse, if the guest *isn't* agreeing to
> > > > use
> > > > the bounce buffer in the situations where it must do so.
> > >
> > >
> > > OTOH then setting this feature and if you make the device force it,
> > > you are breaking guests restricted-dma-pool which worked
> > > previously, no?
> >
> > Yes. So a platform offering virtio-mmio with restricted DMA, if the
> > driver doesn't accept the offered VIRTIO_F_SWIOTLB, may want to
> > accept that negotiation anyway, and *hope* that the driver/OS are
> > going to use the buffer anyway.
> >
> > I just didn't want to make that same mistake again when formalising
> > and documenting this, and especially when attempting to extend it to
> > PCI.
>
> Of course, the beauty of the restricted-dma-pool as supported by DT is
> that it's a *system* memory buffer, which is actually OK as long as
> it's reserved address space and not just part of normal system memory
> that an unsuspecting guest might use for general purposes. So the
> trusted part of the hypervisor (e.g. pKVM) can *allow* the VMM access
> to that space.
>
> It doesn't *have* to be on-device. That just seemed like the more
> natural way to do it for PCI.
>
> I suppose we *could* allow for the virtio-pci transport to do it the
> same way as virtio-mmio though. The VIRTIO_PCI_CAP_SWIOTLB capability¹
> could reference a range of system memory space, just like the
> `restricted-dma-pool` property does.
>
> It's a weird abstraction especially for a physical PCI device to do
> that because the system memory space is outside its ownership. But in a
> physical device it could be writable, and you could consider it the
> responsibility of the system firmware to configure it appropriately, in
> accordance with the IOMMU and other DMA restrictions of the platform.
>
> That does solve it for the CoCo case without addressing the P2P staging
> case that Christoph mentions, though.
>
>
> ¹ I will rename it, Christoph, if it survives at all. Probably
> VIRTIO_F_RESTRICTED_DMA and VIRTIO_PCI_CAP_RESTRICTED_DMA but of course
> it depends on the semantics we conclude it should have.

OK. So basically, all this does, is a promise by driver to only
DMA into a range of memory?
This part, I get. I wouldn't put it in a capability, just in config
space then.

What I don't get, is what does the *device* want, exactly?

Here's a vague idea to explain the question:

some embedded devices can have addressing restrictions, such
as the number of bits of an address.
In another example, the legacy balloon device only supports addresses
up to 48 bit.
These can still be useful if driver does not use the inaccessible
addresses.



So far so good? Does this look like a generalization of your idea?


Now, a question: DMA API under linux at least, can actually work
around device limitations using a bounce buffer. It does, however,
need to know what to work around.
So, maybe the device needs to expose a range?
But what is the range in your case?


Maybe the new registers list the range of addresses device can access,
and driver promises to be within that range by negotiating the
feature bit?


--
MST