Re: [RFC PATCH v2 04/51] KVM: guest_memfd: Introduce KVM_GMEM_CONVERT_SHARED/PRIVATE ioctls

From: Ackerley Tng
Date: Tue Jul 22 2025 - 14:17:55 EST


Xu Yilun <yilun.xu@xxxxxxxxxxxxxxx> writes:

>> > > >> Yan, Yilun, would it work if, on conversion,
>> > > >>
>> > > >> 1. guest_memfd notifies IOMMU that a conversion is about to happen for a
>> > > >> PFN range
>> > > >
>> > > > It is the Guest fw call to release the pinning.
>> > >
>> > > I see, thanks for explaining.
>> > >
>> > > > By the time VMM get the
>> > > > conversion requirement, the page is already physically unpinned. So I
>> > > > agree with Jason the pinning doesn't have to reach to iommu from SW POV.
>> > > >
>> > >
>> > > If by the time KVM gets the conversion request, the page is unpinned,
>> > > then we're all good, right?
>> >
>> > Yes, unless guest doesn't unpin the page first by mistake.
>>
>> Or maliciously? :-(
>
> Yes.
>
>>
>> My initial response to this was that this is a bug and we don't need to be
>> concerned with it. However, can't this be a DOS from one TD to crash the
>> system if the host uses the private page for something else and the
>> machine #MC's?
>
> I think we are already doing something to prevent vcpus from executing
> then destroy VM, so no further TD accessing. But I assume there is
> concern a TD could just leak a lot of resources, and we are
> investigating if host can reclaim them.
>
> Thanks,
> Yilun

Sounds like a malicious guest could skip unpinning private memory, and
guest_memfd's unmap will fail, leading to a KVM_BUG_ON() as Yan/Rick
suggested here [1].

Actually it seems like a legacy guest would also lead to unmap failures
and the KVM_BUG_ON(), since when TDX connect is enabled, the pinning
mode is enforced, even for non-IO private pages?

I hope your team's investigations find a good way for the host to
reclaim memory, at least from dead TDs! Otherwise this would be an open
hole for guests to leak a host's memory.

Circling back to the original topic [2], it sounds like we're okay for
IOMMU to *not* take any refcounts on pages and can rely on guest_memfd
to keep the page around on behalf of the VM?

[1] https://lore.kernel.org/all/diqzcya13x2j.fsf@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/
[2] https://lore.kernel.org/all/CAGtprH_qh8sEY3s-JucW3n1Wvoq7jdVZDDokvG5HzPf0HV2=pg@xxxxxxxxxxxxxx/