Re: [PATCH v3 3/5] iommu: Add verisilicon IOMMU driver

From: Benjamin Gaignard
Date: Thu Jun 19 2025 - 12:28:28 EST

Next message: Vlastimil Babka: "Re: [RFC PATCH v8 4/7] mm/mempolicy: Export memory policy symbols"
Previous message: Daniel Gomez: "Re: [PATCH 1/3] module: move 'struct module_use' to internal.h"
In reply to: Jason Gunthorpe: "Re: [PATCH v3 3/5] iommu: Add verisilicon IOMMU driver"
Next in thread: Jason Gunthorpe: "Re: [PATCH v3 3/5] iommu: Add verisilicon IOMMU driver"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Le 19/06/2025 à 15:47, Jason Gunthorpe a écrit :

On Thu, Jun 19, 2025 at 03:12:24PM +0200, Benjamin Gaignard wrote:

+static struct iommu_domain *vsi_iommu_domain_alloc_paging(struct device *dev)
+{
+ struct vsi_iommu *iommu = vsi_iommu_get_from_dev(dev);
+ struct vsi_iommu_domain *vsi_domain;
+
+ vsi_domain = kzalloc(sizeof(*vsi_domain), GFP_KERNEL);
+ if (!vsi_domain)
+ return NULL;
+
+ vsi_domain->dma_dev = iommu->dev;
+ iommu->domain = &vsi_identity_domain;

?? alloc paging should not change the iommu.

Probably this belongs in vsi_iommu_probe_device if the device starts
up in an identity translation mode.

Your are right it useless here, I will remove it.

+static u32 *vsi_dte_get_page_table(struct vsi_iommu_domain *vsi_domain, dma_addr_t iova)
+{
+ u32 *page_table, *dte_addr;
+ u32 dte_index, dte;
+ phys_addr_t pt_phys;
+ dma_addr_t pt_dma;
+
+ assert_spin_locked(&vsi_domain->dt_lock);
+
+ dte_index = vsi_iova_dte_index(iova);
+ dte_addr = &vsi_domain->dt[dte_index];
+ dte = *dte_addr;
+ if (vsi_dte_is_pt_valid(dte))
+ goto done;
+
+ page_table = (u32 *)iommu_alloc_pages_sz(GFP_ATOMIC | GFP_DMA32, SPAGE_SIZE);

Unnecessary casts are not the kernel style, I saw a couple others too

Ugh. This ignores the gfp flags that are passed into map because you
have to force atomic due to the spinlock that shouldn't be there :(
This means it does not set GFP_KERNEL_ACCOUNT when required. It would
be better to continue to use the passed in GFP flags but override them
to atomic mode.

I will add a gfp_t parameter and use it like that:
page_table = iommu_alloc_pages_sz(gfp | GFP_ATOMIC | GFP_DMA32, SPAGE_SIZE);

+static int vsi_iommu_identity_attach(struct iommu_domain *domain,
+ struct device *dev)
+{
+ struct vsi_iommu *iommu = dev_iommu_priv_get(dev);
+ struct vsi_iommu_domain *vsi_domain = to_vsi_domain(domain);
+ unsigned long flags;
+ int ret;
+
+ if (WARN_ON(!iommu))
+ return -ENODEV;

These WARN_ON's should be removed. ops are never called by the core
without a probed device.

+static int vsi_iommu_attach_device(struct iommu_domain *domain,
+ struct device *dev)
+{
+ struct vsi_iommu *iommu = dev_iommu_priv_get(dev);
+ struct vsi_iommu_domain *vsi_domain = to_vsi_domain(domain);
+ unsigned long flags;
+ int ret;
+
+ if (WARN_ON(!iommu))
+ return -ENODEV;
+
+ /* iommu already attached */
+ if (iommu->domain == domain)
+ return 0;
+
+ ret = vsi_iommu_identity_attach(&vsi_identity_domain, dev);
+ if (ret)
+ return ret;

Hurm, this is actually quite bad, now that it is clear the HW is in an
identity mode it is actually a security problem for VFIO to switch the
translation to identity during attach_device. I'd really prefer new
drivers don't make this mistake.

It seems the main thing motivating this is the fact a linked list has
only a single iommu->node so you can't attach the iommu to both the
new/old domain and atomically update the page table base.

Is it possible for the HW to do a blocking behavior? That would be an
easy fix.. You should always be able to force this by allocating a
shared top page table level during probe time and making it entirely
empty while staying always in the paging mode. Maybe there is a less
expensive way.

Otherwise you probably have work more like the other drivers and
allocate a struct for each attachment so you can have the iommu
attached two domains during the switch over and never drop to an
identity mode.

I will remove the switch to identity domain and it will works fine.

+ iommu->domain = domain;
+
+ spin_lock_irqsave(&vsi_domain->iommus_lock, flags);
+ list_add_tail(&iommu->node, &vsi_domain->iommus);
+ spin_unlock_irqrestore(&vsi_domain->iommus_lock, flags);
+
+ ret = pm_runtime_get_if_in_use(iommu->dev);
+ if (!ret || WARN_ON_ONCE(ret < 0))
+ return 0;

This probably should have a comment, is the idea the resume will setup
the domain? How does locking of iommu->domain work in that case?

Maybe the suspend resume paths should be holding the group mutex..

+ ret = vsi_iommu_enable(iommu);
+ if (ret)
+ WARN_ON(vsi_iommu_identity_attach(&vsi_identity_domain, dev));

Is this necessary though? vsi_iommu_enable failure cases don't change
the HW, and a few lines above was an identity_attach. Just delay
setting iommu->domain until it succeeds, and this is a simple error.

I think I will change vsi_iommu_enable() prototype to:
static int vsi_iommu_enable(struct vsi_iommu *iommu, struct iommu_domain *domain)
and do iommu->domain = domain; at the end of the function if everything goes correctly.

iommu->domain = domain;

+static struct iommu_ops vsi_iommu_ops = {
+ .identity_domain = &vsi_identity_domain,

Add:

.release_domain = &vsi_identity_domain,

Which will cause the core code to automatically run through to
vsi_iommu_disable() prior to calling vsi_iommu_release_device(), which
will avoid UAF problems.

Also, should the probe functions be doing some kind of validation that
there is only one struct device attached?

which kind of validation ?

Jason

Next message: Vlastimil Babka: "Re: [RFC PATCH v8 4/7] mm/mempolicy: Export memory policy symbols"
Previous message: Daniel Gomez: "Re: [PATCH 1/3] module: move 'struct module_use' to internal.h"
In reply to: Jason Gunthorpe: "Re: [PATCH v3 3/5] iommu: Add verisilicon IOMMU driver"
Next in thread: Jason Gunthorpe: "Re: [PATCH v3 3/5] iommu: Add verisilicon IOMMU driver"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]