Re: [RFC PATCH v2 04/51] KVM: guest_memfd: Introduce KVM_GMEM_CONVERT_SHARED/PRIVATE ioctls

From: Xu Yilun
Date: Thu Jul 17 2025 - 05:41:36 EST


On Wed, Jul 16, 2025 at 03:22:06PM -0700, Ackerley Tng wrote:
> Yan Zhao <yan.y.zhao@xxxxxxxxx> writes:
>
> > On Tue, Jun 24, 2025 at 07:10:38AM -0700, Vishal Annapurve wrote:
> >> On Tue, Jun 24, 2025 at 6:08 AM Jason Gunthorpe <jgg@xxxxxxxx> wrote:
> >> >
> >> > On Tue, Jun 24, 2025 at 06:23:54PM +1000, Alexey Kardashevskiy wrote:
> >> >
> >> > > Now, I am rebasing my RFC on top of this patchset and it fails in
> >> > > kvm_gmem_has_safe_refcount() as IOMMU holds references to all these
> >> > > folios in my RFC.
> >> > >
> >> > > So what is the expected sequence here? The userspace unmaps a DMA
> >> > > page and maps it back right away, all from the userspace? The end
> >> > > result will be the exactly same which seems useless. And IOMMU TLB
> >>
> >> As Jason described, ideally IOMMU just like KVM, should just:
> >> 1) Directly rely on guest_memfd for pinning -> no page refcounts taken
> >> by IOMMU stack
> > In TDX connect, TDX module and TDs do not trust VMM. So, it's the TDs to inform
> > TDX module about which pages are used by it for DMAs purposes.
> > So, if a page is regarded as pinned by TDs for DMA, the TDX module will fail the
> > unmap of the pages from S-EPT.
> >
> > If IOMMU side does not increase refcount, IMHO, some way to indicate that
> > certain PFNs are used by TDs for DMA is still required, so guest_memfd can
> > reject the request before attempting the actual unmap.
> > Otherwise, the unmap of TD-DMA-pinned pages will fail.
> >
> > Upon this kind of unmapping failure, it also doesn't help for host to retry
> > unmapping without unpinning from TD.
> >
> >
>
> Yan, Yilun, would it work if, on conversion,
>
> 1. guest_memfd notifies IOMMU that a conversion is about to happen for a
> PFN range

It is the Guest fw call to release the pinning. By the time VMM get the
conversion requirement, the page is already physically unpinned. So I
agree with Jason the pinning doesn't have to reach to iommu from SW POV.

> 2. IOMMU forwards the notification to TDX code in the kernel
> 3. TDX code in kernel tells TDX module to stop thinking of any PFNs in
> the range as pinned for DMA?

TDX host can't stop the pinning. Actually this mechanism is to prevent
host from unpin/unmap the DMA out of Guest expectation.

Thanks,
Yilun

>
> If the above is possible then by the time we get to unmapping from
> S-EPTs, TDX module would already consider the PFNs in the range "not
> pinned for DMA".