Re: [RFC PATCH 2/2] vfio: Include no-iommu mode

From: Alex Williamson
Date: Mon Oct 12 2015 - 13:47:03 EST


On Mon, 2015-10-12 at 19:27 +0300, Michael S. Tsirkin wrote:
> On Mon, Oct 12, 2015 at 08:56:07AM -0700, Stephen Hemminger wrote:
> > On Fri, 09 Oct 2015 12:41:10 -0600
> > Alex Williamson <alex.williamson@xxxxxxxxxx> wrote:
> >
> > > There is really no way to safely give a user full access to a PCI
> > > without an IOMMU to protect the host from errant DMA. There is also
> > > no way to provide DMA translation, for use cases such as devices
> > > assignment to virtual machines. However, there are still those users
> > > that want userspace drivers under those conditions. The UIO driver
> > > exists for this use case, but does not provide the degree of device
> > > access and programming that VFIO has. In an effort to avoid code
> > > duplication, this introduces a No-IOMMU mode for VFIO.
> > >
> > > This mode requires enabling CONFIG_VFIO_NOIOMMU and loading the vfio
> > > module with the option "enable_unsafe_pci_noiommu_mode". This should
> > > make it very clear that this mode is not safe. In this mode, there is
> > > no support for unprivileged users, CAP_SYS_ADMIN is required for
> > > access to the necessary dev files. Mixing no-iommu and secure VFIO is
> > > also unsupported, as are any VFIO IOMMU backends other than the
> > > vfio-noiommu backend. Furthermore, unsafe group files are relocated
> > > to /dev/vfio-noiommu/. Upon successful loading in this mode, the
> > > kernel is tainted due to the dummy IOMMU put in place. Unloading of
> > > the module in this mode is also unsupported and will BUG due to the
> > > lack of support for unregistering an IOMMU for a bus type.
> > >
> > > Signed-off-by: Alex Williamson <alex.williamson@xxxxxxxxxx>
> >
> > Will this work for distro's where chaning kernel command line options
> > is really not that practical. We need to boot with one command line
> > and then decide to use IOMMU (or not) later on during the service
> > startup of the dataplane application.
>
> On open? That's too late in my opinion. But maybe the flag can be
> tweaked so that it will probe for iommu, if there - do the
> right thing, but if that fails, enable the dummy one.
> And maybe defer tainting until device open.

The vfio mechanics are that a vfio bus driver, such as vfio-pci binds to
a device. In the probe function, we check for an iommu group, which
vfio-core then uses to create the vfio group. So there's nothing to
open(), the iommu association needs to be made prior to even binding the
device to vfio-pci. Probing for an iommu can also only be done on a per
bus_type basis, which will likely eventually become a per bus instance
to support heterogeneous iommus, so vfio can't simply determine that an
iommu is not present globally. This is why the new module option
includes the word "pci", so that it can probe for and attach the dummy
iommu specifically on the pci_bus_type.

We can still consider if there are better points at which to initiate
the fake iommu group. Trying to think through vfio-pci doing it on
probe(), but it seems pretty ugly.

In this RFC, I specifically avoided making the vfio no-iommu iommu
driver just another modular iommu backend, I wanted it to be tied to a
vfio module option such that vfio behaves differently with open()s and
certain ioctls. I think it would be really confusing to users if safe
and unsafe modes could be used concurrently for different devices.

> Won't address the "old IOMMUs add performance overhead"
> usecase but I'm not very impressed by that in any case.

Yep, me neither, certainly not for static mappings. There's a lot of
FUD left over from latencies in the streaming DMA mapping paths where
mappings are created and destroyed at a high rate. That has more to do
with flushing mappings out of the hardware than with iotlb miss latency
or actual translation, which is all that should be in play for most uses
here.

> > Recent experience is that IOMMU's
> > are broken on many platforms so the only way to make a DPDK application
> > it to write a test program that can be used to check if VFIO+IOMMU
> > works first.
>
> In userspace? Well that's just piling up work-arounds. And assuming
> hardware is broken, who knows what's going on security-wise. These
> broken systems need to be identified and black-listed in kernel.
>
> > Also, although you think the long option will set the bar high
> > enough it probably will not satisfy anyone. It is annoying enough, that
> > I would just carry a patch to remove it the silly requirement.
>
> That sounds reasonable. Anyone who can carry a kernel patch
> does not need the warning.
>
> > And the the people who believe
> > all user mode DMA is evil won't be satisfied either.
> > But I really like having the same consistent API for handling device
> > access with IOMMU and when IOMMU will/won't work.
>
> I agree that's good. Makes it easier to migrate applications to
> the safe configuration down the road. Thanks Alex!
>



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/