Re: [PATCH 1/9] iommu: Move iommu fault data to linux/iommu.h

From: Baolu Lu
Date: Wed Jul 12 2023 - 23:49:08 EST


On 2023/7/13 11:22, Tian, Kevin wrote:
From: Jean-Philippe Brucker <jean-philippe@xxxxxxxxxx>
Sent: Wednesday, July 12, 2023 5:34 PM

On Wed, Jul 12, 2023 at 10:07:22AM +0800, Baolu Lu wrote:
+/**
+ * struct iommu_fault_unrecoverable - Unrecoverable fault data
+ * @reason: reason of the fault, from &enum iommu_fault_reason
+ * @flags: parameters of this fault (IOMMU_FAULT_UNRECOV_*
values)
+ * @pasid: Process Address Space ID
+ * @perm: requested permission access using by the incoming
transaction
+ * (IOMMU_FAULT_PERM_* values)
+ * @addr: offending page address
+ * @fetch_addr: address that caused a fetch abort, if any
+ */
+struct iommu_fault_unrecoverable {
+ __u32 reason;
+#define IOMMU_FAULT_UNRECOV_PASID_VALID (1 <<
0)
+#define IOMMU_FAULT_UNRECOV_ADDR_VALID (1 <<
1)
+#define IOMMU_FAULT_UNRECOV_FETCH_ADDR_VALID (1 <<
2)
+ __u32 flags;
+ __u32 pasid;
+ __u32 perm;
+ __u64 addr;
+ __u64 fetch_addr;
+};

Currently there is no handler for unrecoverable faults.

Yes those were meant for guest injection. Another goal was to replace
report_iommu_fault(), which also passes unrecoverable faults to host
drivers. Three drivers use that API:
* usnic just prints the error, which could be done by the IOMMU driver,
* remoteproc attempts to recover from the crash,
* msm attempts to handle the fault, or at least recover from the crash.

I was not aware of them. Thanks for pointing out.


So the first one can be removed, and the others could move over to IOPF
(which may need to indicate that the fault is not actually recoverable by
the IOMMU) and return IOMMU_PAGE_RESP_INVALID.

Yep, presumably we should have just one interface to handle fault.



Both Intel/ARM register iommu_queue_iopf() as the device fault handler.
It returns -EOPNOTSUPP for unrecoverable faults.

In your series the common iommu_handle_io_pgfault() also only works
for PRQ.

It kinds of suggest above definitions are dead code, though arm-smmu-v3
does attempt to set them.

Probably it's right time to remove them.

In the future even if there might be a need of forwarding unrecoverable
faults to the user via iommufd, fault reasons reported by the physical
IOMMU doesn't make any sense to the guest.

I guess it depends on the architecture? The SMMU driver can report only
stage-1 faults through iommu_report_device_fault(), which are faults due
to a guest misconfiguring the tables assigned to it. At the moment
arm_smmu_handle_evt() only passes down stage-1 page table errors, the
rest
is printed by the host.

In that case the kernel just needs to notify the vIOMMU an error happened
along with access permissions (r/w/e/p). vIOMMU can figure out the reason
itself by walking the stage-1 page table. Likely it will find the same reason
as host reports, but that sounds a clearer path in concept.


Presumably the vIOMMU
should walk guest configurations to set a fault reason which makes sense
from guest p.o.v.

I am fine to remove unrecoverable faults data. But it was added by Jean,
so I'd like to know his opinion on this.

Passing errors to the guest could be a useful diagnostics tool for
debugging, once the guest gets more controls over the IOMMU hardware,
but
it doesn't have a purpose beyond that. It could be the only tool
available, though: to avoid a guest voluntarily flooding the host logs by
misconfiguring its tables, we may have to disable printing in the host
errors that come from guest misconfiguration, in which case there won't be
any diagnostics available for guest bugs.

For now I don't mind if they're removed, if there is an easy way to
reintroduce them later.


We can keep whatever is required to satisfy the kernel drivers which
want to know the fault.

But for anything invented for old uAPI (e.g. fault_reason) let's remove
them and redefine later when introducing the support to the user.

Okay, I will do this in the next version.

Best regards,
baolu