Re: [RFC PATCH 2/2] ACPI/IORT: use swiotlb_dma_ops when smmu probe failed

From: Robin Murphy
Date: Mon Apr 09 2018 - 08:10:41 EST


On 08/04/18 09:10, Wang, Dongsheng wrote:

-----Original Message-----
From: Robin Murphy [mailto:robin.murphy@xxxxxxx]
Sent: Thursday, April 05, 2018 2:57 AM
To: Lorenzo Pieralisi <lorenzo.pieralisi@xxxxxxx>; Wang, Dongsheng
<dongsheng.wang@xxxxxxxxxxxxxxxx>
Cc: rjw@xxxxxxxxxxxxx; gregkh@xxxxxxxxxxxxxxxxxxx; hanjun.guo@xxxxxxxxxx;
sudeep.holla@xxxxxxx; Zheng, Joey <yu.zheng@xxxxxxxxxxxxxxxx>;
linux-acpi@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
Subject: [æéäåèååéé] Re: [RFC PATCH 2/2] ACPI/IORT: use
swiotlb_dma_ops when smmu probe failed

On 04/04/18 17:01, Lorenzo Pieralisi wrote:
[+cc Robin]

On Thu, Mar 29, 2018 at 03:01:00AM -0700, Wang Dongsheng wrote:
If SMMU probe failed, master should use swiotlb as dma ops.
SMMU may probe failed with specified environment, so there
are not any iommu resources in iommu_device_list.

The master will always get EPROBE_DEFER from really_probe
(dma_configure) but in fact SMMU has probe failed. The issue
causes all of masters failed to be driven.

Let's just take a step back - why is SMMU probe failing? That seems to
be the primary issue here, because it implies that either your hardware,
firmware or kernel is broken, any of which would make boot failure
somewhat unsurprising anyway.

It's actually not a hardware issue. This is my test case, just return
-EINVAL in arm_smmu_device_probe. The HW probe(arm_smmu_device_hw_probe)
is just part of SMMU driver probe and the failure may be caused by SW. So
I design this case, just make sure even if SMMU probe failed that cause by SW,
the MASTER also can work. _Because of our SMMU default mode is bypass._

I don't think it's particularly justifiable to make core API changes for the sake of contrived testcases. On real systems, the SMMU is a fundamental system component which is no more expected to fail probe than, say, the GIC, and as such if it *does* fail then further progress is on a best-effort basis at most. Just because *your* system happens to work fine in this state doesn't make it true for every SMMU implementation and integration that Linux may ever run on.

If you want the kernel to ignore an SMMU, either configure out the driver, or don't describe that SMMU in firmware in the first place.

I added Robin to pick his brain. An alternative would consist
in using a bus notifier to prevent deferred probing once the SMMU
driver probing failed but that seems backwards given that a major
reason to move to deferred probing was to remove the bus notifiers
dependency in the first place.

It seems to me this is both an OF/ACPI issue - it is not an IORT
only problem.

Yes, this is just an instance of the general probe-deferral problem,
e.g. once you have multiple dependencies it's possible to end up in a
stalemate where everything including the IOMMU ends up on the deferred
probe list with nothing to kick it and make progress again.

Furthermore it seems to me that the whole premise in this patch is
flawed,
Ditto. :)


since even genuine probe failure may well be transient - just
because one attempt failed doesn't mean a later attempt can't succeed.
Thus "the most recent probe attempt failed" cannot be considered a
fundamentally different state from "no driver is currently bound".

Agree, the genuine probe failure may well be transient. But there is
depend on SMMU probe(IOMMU instance) status. There are two situations:

1. MASTER probing, SMMU doesn't probe yet.
This case will match "the transient failure".
really_probe get an EPROBE_DEFER from IORT and the MASTER probe will be
delayed until SMMU probe successful.
2. MASTER probing, SMMU probe has failed.
really_probe will always get an EPROBE_DEFER from IORT, because kernel
has build in SMMU driver.(iort_iommu_driver_enabled) And the master
never cannot do probe.

The case 2 is I want to handle.

Handle it by not deliberately breaking the SMMU driver. In all other cases, either re-triggering SMMU probe might make it succeed (i.e. the DL_DEV_PROBE_FAILED state is meaningless), or things are so broken that you're probably dead in the water anyway.

Robin.


Cheers,
-Dongsheng

Robin.


Lorenzo

Signed-off-by: Wang Dongsheng <dongsheng.wang@xxxxxxxxxxxxxxxx>
---
drivers/acpi/arm64/iort.c | 39
+++++++++++++++++++++++++++++++++------
1 file changed, 33 insertions(+), 6 deletions(-)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index e2f7bdd..a6f4c27 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -774,17 +774,45 @@ static int arm_smmu_iort_xlate(struct device
*dev, u32 streamid,
return ret;
}

-static inline bool iort_iommu_driver_enabled(u8 type)
+static int iort_check_dev_dl_status(struct device *dev, void *data)
{
+ struct fwnode_handle *fwnode = data;
+
+ if (dev->fwnode != fwnode)
+ return 0;
+
+ if (dev->links.status == DL_DEV_PROBE_FAILED)
+ return -ENODEV;
+
+ return -EPROBE_DEFER;
+}
+
+static int iort_iommu_driver_enabled(u8 type, struct fwnode_handle
*fwnode)
+{
+ bool buildin;
+ int ret;
+
switch (type) {
case ACPI_IORT_NODE_SMMU_V3:
- return IS_BUILTIN(CONFIG_ARM_SMMU_V3);
+ buildin = IS_BUILTIN(CONFIG_ARM_SMMU_V3);
+ break;
case ACPI_IORT_NODE_SMMU:
- return IS_BUILTIN(CONFIG_ARM_SMMU);
+ buildin = IS_BUILTIN(CONFIG_ARM_SMMU);
+ break;
default:
pr_warn("IORT node type %u does not describe an SMMU\n",
type);
- return false;
+ buildin = false;
}
+
+ if (!buildin)
+ return -ENODEV;
+
+ ret = bus_for_each_dev(&platform_bus_type, NULL, fwnode,
+ iort_check_dev_dl_status);
+ if (!ret)
+ return -EPROBE_DEFER;
+
+ return ret;
}

#ifdef CONFIG_IOMMU_API
@@ -919,8 +947,7 @@ static int iort_iommu_xlate(struct device *dev,
struct acpi_iort_node *node,
*/
ops = iommu_ops_from_fwnode(iort_fwnode);
if (!ops)
- return iort_iommu_driver_enabled(node->type) ?
- -EPROBE_DEFER : -ENODEV;
+ return iort_iommu_driver_enabled(node->type, iort_fwnode);

return arm_smmu_iort_xlate(dev, streamid, iort_fwnode, ops);
}
--
2.7.4