RE: [PATCH v1 7/8] vfio/type1: Add VFIO_IOMMU_CACHE_INVALIDATE

From: Liu, Yi L
Date: Fri Apr 17 2020 - 02:03:46 EST


Hi Alex,
> From: Alex Williamson <alex.williamson@xxxxxxxxxx>
> Sent: Thursday, April 16, 2020 10:41 PM
> To: Liu, Yi L <yi.l.liu@xxxxxxxxx>
> Subject: Re: [PATCH v1 7/8] vfio/type1: Add VFIO_IOMMU_CACHE_INVALIDATE
>
> On Thu, 16 Apr 2020 10:40:03 +0000
> "Liu, Yi L" <yi.l.liu@xxxxxxxxx> wrote:
>
> > Hi Alex,
> > Still have a direction question with you. Better get agreement with you
> > before heading forward.
> >
> > > From: Alex Williamson <alex.williamson@xxxxxxxxxx>
> > > Sent: Friday, April 3, 2020 11:35 PM
> > [...]
> > > > > > + *
> > > > > > + * returns: 0 on success, -errno on failure.
> > > > > > + */
> > > > > > +struct vfio_iommu_type1_cache_invalidate {
> > > > > > + __u32 argsz;
> > > > > > + __u32 flags;
> > > > > > + struct iommu_cache_invalidate_info cache_info;
> > > > > > +};
> > > > > > +#define VFIO_IOMMU_CACHE_INVALIDATE _IO(VFIO_TYPE,
> > > VFIO_BASE
> > > > > + 24)
> > > > >
> > > > > The future extension capabilities of this ioctl worry me, I wonder if
> > > > > we should do another data[] with flag defining that data as CACHE_INFO.
> > > >
> > > > Can you elaborate? Does it mean with this way we don't rely on iommu
> > > > driver to provide version_to_size conversion and instead we just pass
> > > > data[] to iommu driver for further audit?
> > >
> > > No, my concern is that this ioctl has a single function, strictly tied
> > > to the iommu uapi. If we replace cache_info with data[] then we can
> > > define a flag to specify that data[] is struct
> > > iommu_cache_invalidate_info, and if we need to, a different flag to
> > > identify data[] as something else. For example if we get stuck
> > > expanding cache_info to meet new demands and develop a new uapi to
> > > solve that, how would we expand this ioctl to support it rather than
> > > also create a new ioctl? There's also a trade-off in making the ioctl
> > > usage more difficult for the user. I'd still expect the vfio layer to
> > > check the flag and interpret data[] as indicated by the flag rather
> > > than just passing a blob of opaque data to the iommu layer though.
> > > Thanks,
> >
> > Based on your comments about defining a single ioctl and a unified
> > vfio structure (with a @data[] field) for pasid_alloc/free, bind/
> > unbind_gpasid, cache_inv. After some offline trying, I think it would
> > be good for bind/unbind_gpasid and cache_inv as both of them use the
> > iommu uapi definition. While the pasid alloc/free operation doesn't.
> > It would be weird to put all of them together. So pasid alloc/free
> > may have a separate ioctl. It would look as below. Does this direction
> > look good per your opinion?
> >
> > ioctl #22: VFIO_IOMMU_PASID_REQUEST
> > /**
> > * @pasid: used to return the pasid alloc result when flags == ALLOC_PASID
> > * specify a pasid to be freed when flags == FREE_PASID
> > * @range: specify the allocation range when flags == ALLOC_PASID
> > */
> > struct vfio_iommu_pasid_request {
> > __u32 argsz;
> > #define VFIO_IOMMU_ALLOC_PASID (1 << 0)
> > #define VFIO_IOMMU_FREE_PASID (1 << 1)
> > __u32 flags;
> > __u32 pasid;
> > struct {
> > __u32 min;
> > __u32 max;
> > } range;
> > };
>
> Can't the ioctl return the pasid valid on alloc (like GET_DEVICE_FD)?

Yep, I think you mentioned before. At that time, I believed it would be
better to return the result via a __u32 buffer so that make full use of
the 32 bits. But looks like it doesn't make much difference. I'll follow
your suggestion.

> Would it be useful to support freeing a range of pasids? If so then we
> could simply use range for both, ie. allocate a pasid from this range
> and return it, or free all pasids in this range? vfio already needs to
> track pasids to free them on release, so presumably this is something
> we could support easily.

yes, I think it is a nice thing. then I can remove the @pasid field.
will do it.

> > ioctl #23: VFIO_IOMMU_NESTING_OP
> > struct vfio_iommu_type1_nesting_op {
> > __u32 argsz;
> > __u32 flags;
> > __u32 op;
> > __u8 data[];
> > };
>
> data only has 4-byte alignment, I think we really want it at an 8-byte
> alignment. This is why I embedded the "op" into the flag for
> DEVICE_FEATURE. Thanks,

got it. I may also merge the op into flags (maybe the lower 16 bits for
op).

Thanks,
Yi Liu
> Alex
>
> >
> > /* Nesting Ops */
> > #define VFIO_IOMMU_NESTING_OP_BIND_PGTBL 0
> > #define VFIO_IOMMU_NESTING_OP_UNBIND_PGTBL 1
> > #define VFIO_IOMMU_NESTING_OP_CACHE_INVLD 2
> >
> > Thanks,
> > Yi Liu
> >