Re: arm64 iommu groups issue

From: Robin Murphy
Date: Thu Feb 13 2020 - 14:40:48 EST


On 13/02/2020 3:49 pm, John Garry wrote:

The underlying issue is that, for historical reasons, OF/IORT-based
IOMMU drivers have ended up with group allocation being tied to endpoint
driver probing via the dma_configure() mechanism (long story short,
driver probe is the only thing which can be delayed in order to wait for
a specific IOMMU instance to be ready).However, in the meantime, the
IOMMU API internals have evolved sufficiently that I think there's a way
to really put things right - I have the spark of an idea which I'll try
to sketch out ASAP...


OK, great.

Hi Robin,

I was wondering if you have had a chance to consider this problem again?

Yeah, sorry, more things came up such that it still hasn't been P yet ;)

Lorenzo and I did get as far as discussing it a bit more and writing up a ticket, so it's formally on our internal radar now, at least.

One simple idea could be to introduce a device link between the endpoint device and its parent bridge to ensure that they probe in order, as expected in pci_device_group():

Subject: [PATCH] PCI: Add device link to ensure endpoint device driver probes after bridge

It is required to ensure that a device driver for an endpoint will probe
after the parent port driver, so add a device link for this.

---
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 512cb4312ddd..4b832ad25b20 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2383,6 +2383,7 @@ static void pci_set_msi_domain(struct pci_dev *dev)
Âvoid pci_device_add(struct pci_dev *dev, struct pci_bus *bus)
Â{
ÂÂÂÂ int ret;
+ÂÂÂ struct device *parent;

ÂÂÂÂ pci_configure_device(dev);

@@ -2420,6 +2421,10 @@ void pci_device_add(struct pci_dev *dev, struct pci_bus *bus)
ÂÂÂÂ /* Set up MSI IRQ domain */
ÂÂÂÂ pci_set_msi_domain(dev);

+ÂÂÂ parent = dev->dev.parent;
+ÂÂÂ if (parent && parent->bus == &pci_bus_type)
+ÂÂÂÂÂÂÂ device_link_add(&dev->dev, parent, DL_FLAG_AUTOPROBE_CONSUMER);
+
ÂÂÂÂ /* Notifier could use PCI capabilities */
ÂÂÂÂ dev->match_driver = false;
ÂÂÂÂ ret = device_add(&dev->dev);
--

This would work, but the problem is that if the port driver fails in probing - and not just for -EPROBE_DEFER - then the child device will never probe. This very thing happens on my dev board. However we could expand the device links API to cover this sort of scenario.

Yes, that's an undesirable issue, but in fact I think it's mostly indicative that involving drivers in something which is designed to happen at a level below drivers is still fundamentally wrong and doomed to be fragile at best.

Another thought that crosses my mind is that when pci_device_group() walks up to the point of ACS isolation and doesn't find an existing group, it can still infer that everything it walked past *should* be put in the same group it's then eventually going to return. Unfortunately I can't see an obvious way for it to act on that knowledge, though, since recursive iommu_probe_device() is unlikely to end well.

As for alternatives, it looks pretty difficult to me to disassociate the group allocation from the dma_configure path.

Indeed it's non-trivial, but it really does need cleaning up at some point.

Having just had yet another spark, does something like the untested super-hack below work at all? I doubt it's a viable direction to take in itself, but it could be food for thought if it at least proves the concept.

Robin.

----->8-----
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index aa3ac2a03807..554cde76c766 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -3841,20 +3841,20 @@ static int arm_smmu_set_bus_ops(struct iommu_ops *ops)
int err;

#ifdef CONFIG_PCI
- if (pci_bus_type.iommu_ops != ops) {
+ if (1) {
err = bus_set_iommu(&pci_bus_type, ops);
if (err)
return err;
}
#endif
#ifdef CONFIG_ARM_AMBA
- if (amba_bustype.iommu_ops != ops) {
+ if (1) {
err = bus_set_iommu(&amba_bustype, ops);
if (err)
goto err_reset_pci_ops;
}
#endif
- if (platform_bus_type.iommu_ops != ops) {
+ if (1) {
err = bus_set_iommu(&platform_bus_type, ops);
if (err)
goto err_reset_amba_ops;
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 660eea8d1d2f..b81ae2b4d4fb 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1542,6 +1542,14 @@ static int iommu_bus_init(struct bus_type *bus, const struct iommu_ops *ops)
return err;
}

+static int iommu_bus_replay(struct device *dev, void *data)
+{
+ if (dev->iommu_group)
+ return 0;
+
+ return add_iommu_group(dev, data);
+}
+
/**
* bus_set_iommu - set iommu-callbacks for the bus
* @bus: bus.
@@ -1564,6 +1572,9 @@ int bus_set_iommu(struct bus_type *bus, const struct iommu_ops *ops)
return 0;
}

+ if (bus->iommu_ops == ops)
+ return bus_for_each_dev(bus, NULL, NULL, iommu_bus_replay);
+
if (bus->iommu_ops != NULL)
return -EBUSY;