Re: [PATCH V4 1/1] KVM: TDX: Add sub-ioctl KVM_TDX_TERMINATE_VM
From: Vishal Annapurve
Date: Sun Jun 15 2025 - 23:40:44 EST
On Wed, Jun 11, 2025 at 2:52 AM Adrian Hunter <adrian.hunter@xxxxxxxxx> wrote:
>
> From: Sean Christopherson <seanjc@xxxxxxxxxx>
>
> Add sub-ioctl KVM_TDX_TERMINATE_VM to release the HKID prior to shutdown,
> which enables more efficient reclaim of private memory.
>
> Private memory is removed from MMU/TDP when guest_memfds are closed. If
> the HKID has not been released, the TDX VM is still in RUNNABLE state,
> so pages must be removed using "Dynamic Page Removal" procedure (refer
> TDX Module Base spec) which involves a number of steps:
> Block further address translation
> Exit each VCPU
> Clear Secure EPT entry
> Flush/write-back/invalidate relevant caches
>
> However, when the HKID is released, the TDX VM moves to TD_TEARDOWN state
> where all TDX VM pages are effectively unmapped, so pages can be reclaimed
> directly.
>
> Reclaiming TD Pages in TD_TEARDOWN State was seen to decrease the total
> reclaim time. For example:
>
> VCPUs Size (GB) Before (secs) After (secs)
> 4 18 72 24
> 32 107 517 134
> 64 400 5539 467
>
> Link: https://lore.kernel.org/r/Z-V0qyTn2bXdrPF7@xxxxxxxxxx
> Link: https://lore.kernel.org/r/aAL4dT1pWG5dDDeo@xxxxxxxxxx
> Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
> Co-developed-by: Adrian Hunter <adrian.hunter@xxxxxxxxx>
> Signed-off-by: Adrian Hunter <adrian.hunter@xxxxxxxxx>
> ---
>
>
> Changes in V4:
>
> Drop TDX_FLUSHVP_NOT_DONE change. It will be done separately.
> Use KVM_BUG_ON() instead of WARN_ON().
> Correct kvm_trylock_all_vcpus() return value.
>
> Changes in V3:
>
> Remove KVM_BUG_ON() from tdx_mmu_release_hkid() because it would
> trigger on the error path from __tdx_td_init()
>
> Put cpus_read_lock() handling back into tdx_mmu_release_hkid()
>
> Handle KVM_TDX_TERMINATE_VM in the switch statement, i.e. let
> tdx_vm_ioctl() deal with kvm->lock
> ....
>
> +static int tdx_terminate_vm(struct kvm *kvm)
> +{
> + if (kvm_trylock_all_vcpus(kvm))
> + return -EBUSY;
> +
> + kvm_vm_dead(kvm);
With this no more VM ioctls can be issued on this instance. How would
userspace VMM clean up the memslots? Is the expectation that
guest_memfd and VM fds are closed to actually reclaim the memory?
Ability to clean up memslots from userspace without closing
VM/guest_memfd handles is useful to keep reusing the same guest_memfds
for the next boot iteration of the VM in case of reboot.
> +
> + kvm_unlock_all_vcpus(kvm);
> +
> + tdx_mmu_release_hkid(kvm);
> +
> + return 0;
> +}
> +