Re: [PATCH v3 7/7] iommu/riscv: Paging domain support

From: Tomasz Jeznach
Date: Mon May 06 2024 - 22:22:28 EST


On Sun, May 5, 2024 at 8:46 AM Jason Gunthorpe <jgg@xxxxxxxx> wrote:
>
> On Fri, May 03, 2024 at 12:44:09PM -0700, Tomasz Jeznach wrote:
> > > For detach I think yes:
> > >
> > > Inv CPU Detach CPU
> > >
> > > write io_pte Update device descriptor
> > > rcu_read_lock
> > > list_for_each
> > > <make invalidation command> <make description inv cmd>
> > > dma_wmb() dma_wmb()
> > > <doorbell> <cmd doorbell>
> > > rcu_read_unlock
> > > list_del_rcu()
> > > <wipe ASID>
> > >
> > > In this case I think we never miss an invalidation, the list_del is
> > > always after the HW has been fully fenced, so I don't think we can
> > > have any issue. Maybe a suprious invalidation if the ASID gets
> > > re-used, but who cares.
> > >
> > > Attach is different..
> > >
> > > Inv CPU Attach CPU
> > >
> > > write io_pte
> > > rcu_read_lock
> > > list_for_each // empty
> > > list_add_rcu()
> > > Update device descriptor
> > > <make description inv cmd>
> > > dma_wmb()
> > > <cmd doorbell>
> > > rcu_read_unlock
> > >
> > > As above shows we can "miss" an invalidation. The issue is narrow, the
> > > io_pte could still be sitting in write buffers in "Inv CPU" and not
> > > yet globally visiable. "Attach CPU" could get the device descriptor
> > > installed in the IOMMU and the IOMMU could walk an io_pte that is in
> > > the old state. Effectively this is because there is no release/acquire
> > > barrier passing the io_pte store from the Inv CPU to the Attach CPU to the
> > > IOMMU.
> > >
> > > It seems like it should be solvable somehow:
> > > 1) Inv CPU releases all the io ptes
> > > 2) Attach CPU acquires the io ptes before updating the DDT
> > > 3) Inv CPU acquires the RCU list in such a way that either attach
> > > CPU will acquire the io_pte or inv CPU will acquire the RCU list.
> > > 4) Either invalidation works or we release the new iopte to the SMMU
> > > and don't need it.
> > >
> > > But #3 is a really weird statement. smb_mb() on both sides may do the
> > > job??
> > >
> >
> > Actual attach sequence is slightly different.
> >
> > Inv CPU Attach CPU
> >
> > write io_pte
> > rcu_read_lock
> > list_for_each // empty
> > list_add_rcu()
> > IOTLB.INVAL(PSCID)
> > <make description inv cmd>
> > dma_wmb()
> > <cmd doorbell>
> > rcu_read_unlock
> >
> > I've tried to cover this case with riscv_iommu_iotlb_inval() called
> > before the attached domain is visible to the device.
>
> That invalidation shouldn't do anything. If this is the first attach
> of a PSCID then the PSCID had better already be empty, it won't become
> non-empty until the DDT entry is installed.
>
> And if it is the second attach then the Inv CPU is already taking care
> of things, no need to invalidate at all.
>
> Regardless, there is still a theortical race that the IOPTEs haven't
> been made visible yet because there is still no synchronization with
> the CPU writing them.
>
> So, I don't think this solves any problem. I belive you need the
> appropriate kind of CPU barrier here instead of an invalidation.
>

Yes. There was a small, but still plausible race w/ IOPTEs visibility
to the IOMMU.
For v5 I'm adding two barriers to the inval/detach flow, I believe
should cover it.

1) In riscv_iommu_iotlb_inval() unconditional dma_wmb() to make any
pending writes to PTEs visible to the IOMMU device. This should cover
the case when list_add_rcu() update is not yet visible in the
_iotlb_inval() sequence, for the first time the domain is attached to
the IOMMU.

Inv CPU Attach CPU
write io_pte
dma_wmb (1)
rcu_read_lock
list_for_each // empty list_add_rcu()
smp_wmb (2)
Update device descriptor
<make description inv cmd>
// PTEs are visible to the HW (*1)
dma_wmb()
<cmd doorbell>
rcu_read_unlock

2) In riscv_iommu_bond_link() write memory barrier to ensure list
update is visible before IOMMU descriptor update. If stale data has
been fetched by the HW, inval CPU will run iotlb-invalidation
sequence. There is a possibility that IOMMU will fetch correct PTEs
and will receive unnecessary IOTLB inval, but I don't think anyone
would care.

Inv CPU Attach CPU
write io_pte list_add_rcu()
smp_wmb (2)
Update device descriptor
<make description inv cmd>
// HW might fetch stale PTEs
dma_wmb()
<cmd doorbell>
dma_wmb (1)
rcu_read_lock
list_for_each // non-empty (*2)
<make iotlb inval cmd>
dma_wmb()
<cmd doorbell>
rcu_read_unlock

3) I've also updated riscv_iommu_bond_unlink() to wipe the PSCID cache
on the last domain unlink from the IOMMU.

Thank you for pointing this out. Let me know if that makes sense.

Best,
- Tomasz

> Jason