Re: [PATCH RFC 00/15] Add VFIO mediated device support and IMS support for the idxd driver.

From: Dey, Megha
Date: Fri May 01 2020 - 18:31:55 EST


Hi Jason,

On 4/23/2020 12:18 PM, Jason Gunthorpe wrote:
On Wed, Apr 22, 2020 at 02:24:11PM -0700, Dan Williams wrote:
On Tue, Apr 21, 2020 at 4:55 PM Jason Gunthorpe <jgg@xxxxxxxxxxxx> wrote:

On Tue, Apr 21, 2020 at 04:33:46PM -0700, Dave Jiang wrote:
The actual code is independent of the stage 2 driver code submission that adds
support for SVM, ENQCMD(S), PASID, and shared workqueues. This code series will
support dedicated workqueue on a guest with no vIOMMU.

A new device type "mdev" is introduced for the idxd driver. This allows the wq
to be dedicated to the usage of a VFIO mediated device (mdev). Once the work
queue (wq) is enabled, an uuid generated by the user can be added to the wq
through the uuid sysfs attribute for the wq. After the association, a mdev can
be created using this UUID. The mdev driver code will associate the uuid and
setup the mdev on the driver side. When the create operation is successful, the
uuid can be passed to qemu. When the guest boots up, it should discover a DSA
device when doing PCI discovery.

I'm feeling really skeptical that adding all this PCI config space and
MMIO BAR emulation to the kernel just to cram this into a VFIO
interface is a good idea, that kind of stuff is much safer in
userspace.

Particularly since vfio is not really needed once a driver is using
the PASID stuff. We already have general code for drivers to use to
attach a PASID to a mm_struct - and using vfio while disabling all the
DMA/iommu config really seems like an abuse.

A /dev/idxd char dev that mmaps a bar page and links it to a PASID
seems a lot simpler and saner kernel wise.

The mdev utilizes Interrupt Message Store or IMS[3] instead of MSIX for
interrupts for the guest. This preserves MSIX for host usages and also allows a
significantly larger number of interrupt vectors for guest usage.

I never did get a reply to my earlier remarks on the IMS patches.

The concept of a device specific addr/data table format for MSI is not
Intel specific. This should be general code. We have a device that can
use this kind of kernel capability today.

This has been my concern reviewing the implementation. IMS needs more
than one in-tree user to validate degrees of freedom in the api. I had
been missing a second "in-tree user" to validate the scope of the
flexibility that was needed.

IMS is too narrowly specified.

All platforms that support MSI today can support IMS. It is simply a
way for the platform to give the driver an addr/data pair that triggers
an interrupt when a posted write is performed to that pair.


Well, yes and no. IMS requires interrupt remapping in addition to the dynamic nature of IRQ allocation.

This is different from the other interrupt setup flows which are
tightly tied to the PCI layer. Here the driver should simply ask for
interrupts.

Ie the entire IMS API to the driver should be something very simple
like:

struct message_irq
{
uint64_t addr;
uint32_t data;
};

struct message_irq *request_message_irq(
struct device *, irq_handler_t handler, unsigned long flags,
const char *name, void *dev);

And the plumbing underneath should setup the irq chips and so forth as
required.


yes, this seems correct.
Jason