Re: [PATCH 3/5] vDPA: introduce vDPA bus

From: Jason Wang
Date: Tue Jan 21 2020 - 03:35:54 EST



On 2020/1/21 äå4:15, Michael S. Tsirkin wrote:
On Tue, Jan 21, 2020 at 04:00:38PM +0800, Jason Wang wrote:
On 2020/1/21 äå1:47, Michael S. Tsirkin wrote:
On Tue, Jan 21, 2020 at 12:00:57PM +0800, Jason Wang wrote:
On 2020/1/21 äå1:49, Jason Gunthorpe wrote:
On Mon, Jan 20, 2020 at 04:43:53PM +0800, Jason Wang wrote:
This is similar to the design of platform IOMMU part of vhost-vdpa. We
decide to send diffs to platform IOMMU there. If it's ok to do that in
driver, we can replace set_map with incremental API like map()/unmap().

Then driver need to maintain rbtree itself.
I think we really need to see two modes, one where there is a fixed
translation without dynamic vIOMMU driven changes and one that
supports vIOMMU.
I think in this case, you meant the method proposed by Shahaf that sends
diffs of "fixed translation" to device?

It would be kind of tricky to deal with the following case for example:

old map [4G, 16G) new map [4G, 8G)

If we do

1) flush [4G, 16G)
2) add [4G, 8G)

There could be a window between 1) and 2).

It requires the IOMMU that can do

1) remove [8G, 16G)
2) flush [8G, 16G)
3) change [4G, 8G)

....
Basically what I had in mind is something like qemu memory api

0. begin
1. remove [8G, 16G)
2. add [4G, 8G)
3. commit

This sounds more flexible e.g driver may choose to implement static mapping
one through commit. But a question here, it looks to me this still requires
the DMA to be synced with at least commit here. Otherwise device may get DMA
fault? Or device is expected to be paused DMA during begin?

Thanks
For example, commit might switch one set of tables for another,
without need to pause DMA.


Yes, I think that works but need confirmation from Shahaf or Jason.

Thanks




Anyway, I'm fine with a one-shot API for now, we can
improve it later.

There are different optimization goals in the drivers for these two
configurations.

If the first one, then I think memory hotplug is a heavy flow
regardless. Do you think the extra cycles for the tree traverse
will be visible in any way?
I think if the driver can pause the DMA during the time for setting up new
mapping, it should be fine.
This is very tricky for any driver if the mapping change hits the
virtio rings. :(

Even a IOMMU using driver is going to have problems with that..

Jason
Or I wonder whether ATS/PRI can help here. E.g during I/O page fault,
driver/device can wait for the new mapping to be set and then replay the
DMA.

Thanks