Re: [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain

From: Robin Murphy
Date: Wed May 11 2016 - 09:48:57 EST


On 11/05/16 10:44, Eric Auger wrote:
Hi Robin,
On 05/11/2016 11:31 AM, Robin Murphy wrote:
On 11/05/16 09:38, Eric Auger wrote:
Hi Robin, Alex,
On 05/10/2016 07:24 PM, Robin Murphy wrote:
Hi Eric,

On 10/05/16 17:10, Eric Auger wrote:
Hi Alex,
On 05/10/2016 12:49 AM, Alex Williamson wrote:
On Wed, 4 May 2016 11:54:16 +0000
Eric Auger <eric.auger@xxxxxxxxxx> wrote:

On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is
abstracted
by the msi controller. vfio_safe_irq_domain allows to check whether
interrupts are "safe" for a given device. They are if the device does
not use MSI

Are we sure we're not opening a security hole here? An MSI is
simply a
DMA write, so really whether or not a device uses MSI is irrelevant.
If it can generate a DMA to the MSI doorbell then we need to be
protected and I think we pretty much need to assume that devices are
DMA capable. Do the MSI domain checks cover this?
Let me try to rephrase: we check the device is not attached to an MSI
controller (I think this is the semantic of dev_get_msi_domain(dev)).

If it is not, we don't have to care about MSI isolation: there will be
no IOMMU binding between the device and any MSI doorbell. If it is we
check the msi domain is backed by an MSI controller able to perform MSI
isolation.

So effectively "usage of MSIs" is improper - since it is decided after
the group attachment anyway - and the commit message should rather
state "if the device is linked to an MSI controller" (dt msi-parent
notion I think).

Hmm, I think Alex has a point here - on a GICv2m I can happily fire
arbitrary MSIs from _a shell_ (using /dev/mem), and the CPUs definitely
aren't in an MSI domain, so I don't think it's valid to assume that a
device using only wired interrupts, therefore with no connection to any
MSI controller, isn't still capable of maliciously spewing DMA all over
any and every doorbell region in the system.

Sorry but I still don't get the point. For the device to reach the
doorbell there must be an IOMMU mapping.

Only when the doorbell is _downstream_ of IOMMU translation.
Yes but that's the root hypothesis with VFIO assignment, right?

Except if the no-iommu option is explicitly set by the userspace, there
must be an IOMMU downstream to the device that protects all the DMA
transactions.

The IOMMU is downstream of the device, clearly, but the MSI machinery itself could be _in between_ the two - say, integrated into a PCIe root complex served by an external SMMU. The doorbell is carved out of the upstream address space so that a write targeting the appropriate address hits it directly and never even goes out to the SMMU - e.g. http://thread.gmane.org/gmane.linux.kernel.pci/47174/focus=47268

- if the device is not attached to an MSI domain, there won't be any
doorbell iommu mapping built by this series, so no risk, right?

The device will be allowed to reach only memory iommu mapped by
userspace with VFIO DMA MAP standard API. Of course if the userspace can
mmap all the host PA that's a more general issue, right?

- If the device is attached to an MSI domain (msi-parent link), 2 cases:
1) the MSI controller advertises MSI isolation (ITS cases), no risk
2) the MSI controller does not advertise MSI isolation (GICv2m), there
is a security hole.
a) by default we reject the device attachment
b) if the userspace overrides the safe interrupt option he accepts
the security hole

What am I missing?

The x86-with-interrupt-remapping-disabled case. Currently, without
unsafe_interrupts, everything is rejected - with this patch, we'll still
reject anything with an MSI domain, but e.g. legacy PCI devices using
INTx would be free to scribble all over the un-translated interrupt
region willy-nilly.
Assuming we have an IOMMU, either we have translated transactions or
faults? So if legacy PCI devices try to write into a doorbell, there
must be faults because there is no IOMMU mapping for doorbells?

To the best of my understanding, having been wading through the VT-d spec, the case on x86 is as above (OK, so strictly it's more "in parallel with" than "in front of") - writes outside the 0xFEExxxxx interrupt region go to the DMA remapping units, whereas writes within that region go to the interrupt remapping unit, and if that's disabled or not implemented, simply sail straight through to the APIC beyond.

Robin.

Best Regards

Eric
That's the subtle, but significant, change in
behaviour.

Robin.

Best Regards

Eric

Robin.

Does it sound better?

Regards

Eric

or if the device uses MSI and the msi-parent controller
supports IRQ remapping.

Then we check at group level if all devices have safe interrupts: if
not,
we only allow the group to be attached if allow_unsafe_interrupts is
set.

At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP. This is
changed in next patch.

Signed-off-by: Eric Auger <eric.auger@xxxxxxxxxx>

---
v3 -> v4:
- rename vfio_msi_parent_irq_remapping_capable into
vfio_safe_irq_domain
and irq_remapping into safe_irq_domains

v2 -> v3:
- protect vfio_msi_parent_irq_remapping_capable with
CONFIG_GENERIC_MSI_IRQ_DOMAIN
---
drivers/vfio/vfio_iommu_type1.c | 44
+++++++++++++++++++++++++++++++++++++++--
1 file changed, 42 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c
b/drivers/vfio/vfio_iommu_type1.c
index 4d3a6f1..2fc8197 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -37,6 +37,8 @@
#include <linux/vfio.h>
#include <linux/workqueue.h>
#include <linux/msi-iommu.h>
+#include <linux/irqdomain.h>
+#include <linux/msi.h>

#define DRIVER_VERSION "0.2"
#define DRIVER_AUTHOR "Alex Williamson
<alex.williamson@xxxxxxxxxx>"
@@ -777,6 +779,33 @@ static int vfio_bus_type(struct device *dev,
void *data)
return 0;
}

+/**
+ * vfio_safe_irq_domain: returns whether the irq domain
+ * the device is attached to is safe with respect to MSI isolation.
+ * If the irq domain is not an MSI domain, we return it is safe.
+ *
+ * @dev: device handle
+ * @data: unused
+ * returns 0 if the irq domain is safe, -1 if not.
+ */
+static int vfio_safe_irq_domain(struct device *dev, void *data)
+{
+#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
+ struct irq_domain *domain;
+ struct msi_domain_info *info;
+
+ domain = dev_get_msi_domain(dev);
+ if (!domain)
+ return 0;
+
+ info = msi_get_domain_info(domain);
+
+ if (!(info->flags & MSI_FLAG_IRQ_REMAPPING))
+ return -1;
+#endif
+ return 0;
+}
+
static int vfio_iommu_replay(struct vfio_iommu *iommu,
struct vfio_domain *domain)
{
@@ -870,7 +899,7 @@ static int vfio_iommu_type1_attach_group(void
*iommu_data,
struct vfio_group *group, *g;
struct vfio_domain *domain, *d;
struct bus_type *bus = NULL;
- int ret;
+ int ret, safe_irq_domains;

mutex_lock(&iommu->lock);

@@ -893,6 +922,13 @@ static int vfio_iommu_type1_attach_group(void
*iommu_data,

group->iommu_group = iommu_group;

+ /*
+ * Determine if all the devices of the group have a safe irq
domain
+ * with respect to MSI isolation
+ */
+ safe_irq_domains = !iommu_group_for_each_dev(iommu_group, &bus,
+ vfio_safe_irq_domain);
+
/* Determine bus_type in order to allocate a domain */
ret = iommu_group_for_each_dev(iommu_group, &bus,
vfio_bus_type);
if (ret)
@@ -920,8 +956,12 @@ static int vfio_iommu_type1_attach_group(void
*iommu_data,
INIT_LIST_HEAD(&domain->group_list);
list_add(&group->next, &domain->group_list);

+ /*
+ * to advertise safe interrupts either the IOMMU or the MSI
controllers
+ * must support IRQ remapping/interrupt translation
+ */
if (!allow_unsafe_interrupts &&
- !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
+ (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) &&
!safe_irq_domains)) {
pr_warn("%s: No interrupt remapping support. Use the
module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU
support on this platform\n",
__func__);
ret = -EPERM;