Re: [PATCH 09/11] KVM: guest_memfd: Add interface for populating gmem pages with user data

From: Isaku Yamahata
Date: Tue Apr 23 2024 - 19:50:25 EST


On Thu, Apr 04, 2024 at 02:50:31PM -0400,
Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote:

> During guest run-time, kvm_arch_gmem_prepare() is issued as needed to
> prepare newly-allocated gmem pages prior to mapping them into the guest.
> In the case of SEV-SNP, this mainly involves setting the pages to
> private in the RMP table.
>
> However, for the GPA ranges comprising the initial guest payload, which
> are encrypted/measured prior to starting the guest, the gmem pages need
> to be accessed prior to setting them to private in the RMP table so they
> can be initialized with the userspace-provided data. Additionally, an
> SNP firmware call is needed afterward to encrypt them in-place and
> measure the contents into the guest's launch digest.
>
> While it is possible to bypass the kvm_arch_gmem_prepare() hooks so that
> this handling can be done in an open-coded/vendor-specific manner, this
> may expose more gmem-internal state/dependencies to external callers
> than necessary. Try to avoid this by implementing an interface that
> tries to handle as much of the common functionality inside gmem as
> possible, while also making it generic enough to potentially be
> usable/extensible for TDX as well.

I explored how TDX will use this hook. However, it resulted in not using this
hook, and instead used kvm_tdp_mmu_get_walk() with a twist. The patch is below.

Because SEV-SNP manages the RMP that is not tied to NPT directly, SEV-SNP can
ignore TDP MMU page tables when updating RMP.
On the other hand, TDX essentially updates Secure-EPT when it adds a page to
the guest by TDH.MEM.PAGE.ADD(). It needs to protect KVM TDP MMU page tables
with mmu_lock, not guest memfd file mapping with invalidate_lock. The hook
doesn't apply to TDX well. The resulted KVM_TDX_INIT_MEM_REGION logic is as
follows.

get_user_pages_fast(source addr)
read_lock(mmu_lock)
kvm_tdp_mmu_get_walk_private_pfn(vcpu, gpa, &pfn);
if the page table doesn't map gpa, error.
TDH.MEM.PAGE.ADD()
TDH.MR.EXTEND()
read_unlock(mmu_lock)
put_page()