Re: [PATCH 1/8] iommu: Decouple iommu_present() from bus ops

From: Jason Gunthorpe
Date: Fri Jan 27 2023 - 10:43:43 EST


On Fri, Jan 27, 2023 at 03:19:55PM +0000, Robin Murphy wrote:

> The current implementation of device_iommu_mapped() just dates back to when
> dev->iommu_group was the only per-device thing we had, so in principle I
> don't have any conceptual objection to redefining it in terms of "device has
> ops" rather than "device has a group", but as things stand you'd still have
> to do something about PPC first (I know Jason had been pushing on that, but
> I've not kept track of where it got to).

PPC hasn't moved at all, AFAICT. In a few more months I'm going to
suggest we delete the special VFIO support due to it being broken,
distros already having turned it off and nobody caring enough to fix
it..

What does device_iommu_mapped() even really mean?

Looking at usages..

These are fixing SOC HW bugs/issues - the desire seems to be "is the SOC's
IOMMU enabled"

drivers/char/agp/intel-gtt.c: device_iommu_mapped(&intel_private.pcidev->dev));
drivers/dma/sh/rcar-dmac.c: if (device_iommu_mapped(&pdev->dev))
drivers/gpu/drm/i915/i915_utils.c: if (device_iommu_mapped(i915->drm.dev))
?
drivers/usb/dwc3/dwc3-xilinx.c: if (of_dma_is_coherent(dev->of_node) || device_iommu_mapped(dev)) {
drivers/usb/host/xhci.c: if (!(xhci->quirks & XHCI_ZERO_64B_REGS) || !device_iommu_mapped(dev))
drivers/crypto/qat/qat_common/adf_sriov.c: if (!device_iommu_mapped(&pdev->dev))
?

These seem to be trying to decide if iommu_domain's can be used (and
they can't be on power):

drivers/gpu/drm/msm/msm_drv.c: if (device_iommu_mapped(mdp_dev))
drivers/gpu/drm/msm/msm_drv.c: device_iommu_mapped(dev->dev) ||
drivers/gpu/drm/msm/msm_drv.c: device_iommu_mapped(dev->dev->parent);
drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c: if (device_iommu_mapped(dev)) {
drivers/gpu/drm/rockchip/rockchip_drm_drv.c: if (!device_iommu_mapped(dev))
drivers/gpu/drm/tegra/uapi.c: if (device_iommu_mapped(client->base.dev) && client->ops->can_use_memory_ctx) {
drivers/gpu/host1x/context.c: if (!fwspec || !device_iommu_mapped(&ctx->dev)) {
drivers/infiniband/hw/usnic/usnic_ib_main.c: if (!device_iommu_mapped(&pdev->dev)) {

Yikes, trying to map DMA addresses programmed into devices back to CPU addresses:

drivers/misc/habanalabs/common/debugfs.c: if (!user_address || device_iommu_mapped(&hdev->pdev->dev)) {
drivers/misc/habanalabs/gaudi2/gaudi2.c: if (!device_iommu_mapped(&hdev->pdev->dev))

And then sequencing the call to iommu_probe_device() which doesn't
apply to power:

drivers/acpi/scan.c: if (!err && dev->bus && !device_iommu_mapped(dev))
drivers/iommu/of_iommu.c: if (!err && dev->bus && !device_iommu_mapped(dev))

Leaving these:

arch/powerpc/kernel/eeh.c: if (device_iommu_mapped(dev)) {

This is only used to support eeh_iommu_group_to_pe which is only
caleld by vfio_iommu_spapr_tce.c. Since power vfio doesn't work right
now this is uncallable, and when power is fixed this will work
properly.

arch/powerpc/kernel/iommu.c: if (device_iommu_mapped(dev)) {
arch/powerpc/kernel/iommu.c: if (!device_iommu_mapped(dev)) {

These should both be replaced with some kind of 'device has iommu group', since
it is really driving ppc unique group logic.

So, I'd say Baolu's approach is the right thing, just replace the
above two in ppc with something else.

Jason