RE: [PATCH v2 1/3] docs: IOMMU user API

From: Liu, Yi L
Date: Sun Jun 21 2020 - 01:46:56 EST


> From: Alex Williamson <alex.williamson@xxxxxxxxxx>
> Sent: Saturday, June 20, 2020 12:38 AM
>
> On Fri, 19 Jun 2020 03:30:24 +0000
> "Liu, Yi L" <yi.l.liu@xxxxxxxxx> wrote:
>
> > Hi Alex,
> >
> > > From: Alex Williamson <alex.williamson@xxxxxxxxxx>
> > > Sent: Friday, June 19, 2020 10:55 AM
> > >
> > > On Fri, 19 Jun 2020 02:15:36 +0000
> > > "Liu, Yi L" <yi.l.liu@xxxxxxxxx> wrote:
> > >
> > > > Hi Alex,
> > > >
> > > > > From: Alex Williamson <alex.williamson@xxxxxxxxxx>
> > > > > Sent: Friday, June 19, 2020 5:48 AM
> > > > >
> > > > > On Wed, 17 Jun 2020 08:28:24 +0000
> > > > > "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote:
> > > > >
> > > > > > > From: Liu, Yi L <yi.l.liu@xxxxxxxxx>
> > > > > > > Sent: Wednesday, June 17, 2020 2:20 PM
> > > > > > >
> > > > > > > > From: Jacob Pan <jacob.jun.pan@xxxxxxxxxxxxxxx>
> > > > > > > > Sent: Tuesday, June 16, 2020 11:22 PM
> > > > > > > >
> > > > > > > > On Thu, 11 Jun 2020 17:27:27 -0700 Jacob Pan
> > > > > > > > <jacob.jun.pan@xxxxxxxxxxxxxxx> wrote:
> > > > > > > >
> > > > > > > > > >
> > > > > > > > > > But then I thought it even better if VFIO leaves the
> > > > > > > > > > entire
> > > > > > > > > > copy_from_user() to the layer consuming it.
> > > > > > > > > >
> > > > > > > > > OK. Sounds good, that was what Kevin suggested also. I just
> > > > > > > > > wasn't sure how much VFIO wants to inspect, I thought VFIO
> > > > > > > > > layer wanted to do a sanity check.
> > > > > > > > >
> > > > > > > > > Anyway, I will move copy_from_user to iommu uapi layer.
> > > > > > > >
> > > > > > > > Just one more point brought up by Yi when we discuss this offline.
> > > > > > > >
> > > > > > > > If we move copy_from_user to iommu uapi layer, then there will
> > > > > > > > be
> > > > > > > multiple
> > > > > > > > copy_from_user calls for the same data when a VFIO container
> > > > > > > > has
> > > > > > > multiple domains,
> > > > > > > > devices. For bind, it might be OK. But might be additional
> > > > > > > > overhead for TLB
> > > > > > > flush
> > > > > > > > request from the guest.
> > > > > > >
> > > > > > > I think it is the same with bind and TLB flush path. will be
> > > > > > > multiple copy_from_user.
> > > > > >
> > > > > > multiple copies is possibly fine. In reality we allow only one
> > > > > > group per nesting container (as described in patch [03/15]), and
> > > > > > usually there is just one SVA-capable device per group.
> > > > > >
> > > > > > >
> > > > > > > BTW. for moving data copy to iommy layer, there is another point
> > > > > > > which need to consider. VFIO needs to do unbind in bind path if
> > > > > > > bind failed, so it will assemble unbind_data and pass to iommu
> > > > > > > layer. If iommu layer do the copy_from_user, I think it will be failed.
> any
> > > idea?
> > > > >
> > > > > If a call into a UAPI fails, there should be nothing to undo.
> > > > > Creating a partial setup for a failed call that needs to be undone
> > > > > by the caller is not good practice.
> > > >
> > > > is it still a problem if it's the VFIO to undo the partial setup
> > > > before returning to user space?
> > >
> > > Yes. If a UAPI function fails there should be no residual effect.
> >
> > ok. the iommu_sva_bind_gpasid() is per device call. There is no residual
> > effect if it failed. so no partial setup will happen per device.
> >
> > but VFIO needs to use iommu_group_for_each_dev() to do bind, so
> > if iommu_group_for_each_dev() failed, I guess VFIO needs to undo
> > the partial setup for the group. right?
>
> Yes, each individual call should make no changes if it fails, but the
> caller would need to unwind separate calls. If this introduces too
> much knowledge to the caller for the group case, maybe there should be
> a group-level function in the iommu code to handle that. Thanks,


got you. I don't think VFIO needs too much knowledge except the
group info and the bind data. may send updated version based on
your comments.

Thanks,
Yi Liu

> Alex
>
> > > > > > This might be mitigated if we go back to use the same bind_data
> > > > > > for both bind/unbind. Then you can reuse the user object for unwinding.
> > > > > >
> > > > > > However there is another case where VFIO may need to assemble the
> > > > > > bind_data itself. When a VM is killed, VFIO needs to walk
> > > > > > allocated PASIDs and unbind them one-by-one. In such case
> > > > > > copy_from_user doesn't work since the data is created by kernel.
> > > > > > Alex, do you have a suggestion how this usage can be supported?
> > > > > > e.g. asking IOMMU driver to provide two sets of APIs to handle
> user/kernel
> > > generated requests?
> > > > >
> > > > > Yes, it seems like vfio would need to make use of a driver API to do
> > > > > this, we shouldn't be faking a user buffer in the kernel in order to
> > > > > call through to a UAPI. Thanks,
> > > >
> > > > ok, so if VFIO wants to issue unbind by itself, it should use an API
> > > > which passes kernel buffer to IOMMU layer. If the unbind request is
> > > > from user space, then VFIO should use another API which passes user
> > > > buffer pointer to IOMMU layer. makes sense. will align with jacob.
> > >
> > > Sounds right to me. Different approaches might be used for the driver API
> versus
> > > the UAPI, perhaps there is no buffer. Thanks,
> >
> > thanks for your coaching. It may require Jacob to add APIs in iommu layer
> > for the two purposes.
> >
> > Regards,
> > Yi Liu
> >
> > > Alex
> >