Re: [RFC PATCH v2 04/51] KVM: guest_memfd: Introduce KVM_GMEM_CONVERT_SHARED/PRIVATE ioctls
From: Vishal Annapurve
Date: Wed Jul 02 2025 - 10:33:08 EST
On Wed, Jul 2, 2025 at 7:13 AM Jason Gunthorpe <jgg@xxxxxxxx> wrote:
>
> On Wed, Jul 02, 2025 at 06:54:10AM -0700, Vishal Annapurve wrote:
> > On Wed, Jul 2, 2025 at 1:38 AM Yan Zhao <yan.y.zhao@xxxxxxxxx> wrote:
> > >
> > > On Tue, Jun 24, 2025 at 07:10:38AM -0700, Vishal Annapurve wrote:
> > > > On Tue, Jun 24, 2025 at 6:08 AM Jason Gunthorpe <jgg@xxxxxxxx> wrote:
> > > > >
> > > > > On Tue, Jun 24, 2025 at 06:23:54PM +1000, Alexey Kardashevskiy wrote:
> > > > >
> > > > > > Now, I am rebasing my RFC on top of this patchset and it fails in
> > > > > > kvm_gmem_has_safe_refcount() as IOMMU holds references to all these
> > > > > > folios in my RFC.
> > > > > >
> > > > > > So what is the expected sequence here? The userspace unmaps a DMA
> > > > > > page and maps it back right away, all from the userspace? The end
> > > > > > result will be the exactly same which seems useless. And IOMMU TLB
> > > >
> > > > As Jason described, ideally IOMMU just like KVM, should just:
> > > > 1) Directly rely on guest_memfd for pinning -> no page refcounts taken
> > > > by IOMMU stack
> > > In TDX connect, TDX module and TDs do not trust VMM. So, it's the TDs to inform
> > > TDX module about which pages are used by it for DMAs purposes.
> > > So, if a page is regarded as pinned by TDs for DMA, the TDX module will fail the
> > > unmap of the pages from S-EPT.
>
> I don't see this as having much to do with iommufd.
>
> iommufd will somehow support the T=1 iommu inside the TDX module but
> it won't have an IOAS for it since the VMM does not control the
> translation.
>
> The discussion here is for the T=0 iommu which is controlled by
> iommufd and does have an IOAS. It should be popoulated with all the
> shared pages from the guestmemfd.
>
> > > If IOMMU side does not increase refcount, IMHO, some way to indicate that
> > > certain PFNs are used by TDs for DMA is still required, so guest_memfd can
> > > reject the request before attempting the actual unmap.
>
> This has to be delt with between the TDX module and KVM. When KVM
> gives pages to become secure it may not be able to get them back..
>
> This problem has nothing to do with iommufd.
>
> But generally I expect that the T=1 iommu follows the S-EPT entirely
> and there is no notion of pages "locked for dma". If DMA is ongoing
> and a page is made non-secure then the DMA fails.
>
> Obviously in a mode where there is a vPCI device we will need all the
> pages to be pinned in the guestmemfd to prevent any kind of
> migrations. Only shared/private conversions should change the page
> around.
Yes, guest_memfd ensures that all the faulted-in pages (irrespective
of shared or private ranges) are not migratable. We already have a
similar restriction with CPU accesses to encrypted memory ranges that
need arch specific protocols to migrate memory contents.
>
> Maybe this needs to be an integral functionality in guestmemfd?
>
> Jason