Re: [PATCH v11 031/113] KVM: x86/mmu: Allow non-zero value for non-present SPTE and removed SPTE

From: Huang, Kai
Date: Mon Jan 16 2023 - 05:56:07 EST


On Thu, 2023-01-12 at 08:31 -0800, isaku.yamahata@xxxxxxxxx wrote:
> From: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
>
> For TD guest, the current way to emulate MMIO doesn't work any more, as KVM
> is not able to access the private memory of TD guest and do the emulation.
> Instead, TD guest expects to receive #VE when it accesses the MMIO and then
> it can explicitly make hypercall to KVM to get the expected information.
>
> To achieve this, the TDX module always enables "EPT-violation #VE" in the
> VMCS control. And accordingly, for the MMIO spte for the shared GPA,
> 1. KVM needs to set "suppress #VE" bit for the non-present SPTE so that EPT
> violation happens on TD accessing MMIO range. 2. On EPT violation, KVM
> sets the MMIO spte to clear "suppress #VE" bit so the TD guest can receive
> the #VE instead of EPT misconfigration unlike VMX case. For the shared GPA
> that is not populated yet, EPT violation need to be triggered when TD guest
> accesses such shared GPA. The non-present SPTE value for shared GPA should
> set "suppress #VE" bit.
>
> Add "suppress #VE" bit (bit 63) to SHADOW_NONPRESENT_VALUE and
> REMOVED_SPTE. Unconditionally set the "suppress #VE" bit (which is bit 63)
> for both AMD and Intel as: 1) AMD hardware doesn't use this bit when
> present bit is off; 2) for normal VMX guest, KVM never enables the
> "EPT-violation #VE" in VMCS control and "suppress #VE" bit is ignored by
> hardware.
>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
> ---
> arch/x86/include/asm/vmx.h | 1 +
> arch/x86/kvm/mmu/spte.h | 15 ++++++++++++++-
> arch/x86/kvm/mmu/tdp_mmu.c | 8 ++++++++
> 3 files changed, 23 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
> index 498dc600bd5c..cdbf12c1a83c 100644
> --- a/arch/x86/include/asm/vmx.h
> +++ b/arch/x86/include/asm/vmx.h
> @@ -511,6 +511,7 @@ enum vmcs_field {
> #define VMX_EPT_IPAT_BIT (1ull << 6)
> #define VMX_EPT_ACCESS_BIT (1ull << 8)
> #define VMX_EPT_DIRTY_BIT (1ull << 9)
> +#define VMX_EPT_SUPPRESS_VE_BIT (1ull << 63)

I don't know whether you should introduce this macro, since it's not used in
this patch.

> #define VMX_EPT_RWX_MASK (VMX_EPT_READABLE_MASK | \
> VMX_EPT_WRITABLE_MASK | \
> VMX_EPT_EXECUTABLE_MASK)
> diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
> index f190eaf6b2b5..471378ee9071 100644
> --- a/arch/x86/kvm/mmu/spte.h
> +++ b/arch/x86/kvm/mmu/spte.h
> @@ -148,7 +148,20 @@ static_assert(MMIO_SPTE_GEN_LOW_BITS == 8 && MMIO_SPTE_GEN_HIGH_BITS == 11);
>
> #define MMIO_SPTE_GEN_MASK GENMASK_ULL(MMIO_SPTE_GEN_LOW_BITS + MMIO_SPTE_GEN_HIGH_BITS - 1, 0)
>
> +/*
> + * Non-present SPTE value for both VMX and SVM for TDP MMU.
> + * For SVM NPT, for non-present spte (bit 0 = 0), other bits are ignored.
> + * For VMX EPT, bit 63 is ignored if #VE is disabled. (EPT_VIOLATION_VE=0)
> + * bit 63 is #VE suppress if #VE is enabled. (EPT_VIOLATION_VE=1)
> + * For TDX:
> + * TDX module sets EPT_VIOLATION_VE for Secure-EPT and conventional EPT
> + */
> +#ifdef CONFIG_X86_64
> +#define SHADOW_NONPRESENT_VALUE BIT_ULL(63)
> +static_assert(!(SHADOW_NONPRESENT_VALUE & SPTE_MMU_PRESENT_MASK));
> +#else
> #define SHADOW_NONPRESENT_VALUE 0ULL
> +#endif
>
> extern u64 __read_mostly shadow_host_writable_mask;
> extern u64 __read_mostly shadow_mmu_writable_mask;
> @@ -195,7 +208,7 @@ extern u64 __read_mostly shadow_nonpresent_or_rsvd_mask;
> *
> * Only used by the TDP MMU.
> */
> -#define REMOVED_SPTE 0x5a0ULL
> +#define REMOVED_SPTE (SHADOW_NONPRESENT_VALUE | 0x5a0ULL)

This chunk belongs to the previous patch.

>
> /* Removed SPTEs must not be misconstrued as shadow present PTEs. */
> static_assert(!(REMOVED_SPTE & SPTE_MMU_PRESENT_MASK));
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 9cf5844dd34a..6111e3e9266d 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -700,6 +700,14 @@ static inline int tdp_mmu_zap_spte_atomic(struct kvm *kvm,
> * overwrite the special removed SPTE value. No bookkeeping is needed
> * here since the SPTE is going from non-present to non-present. Use
> * the raw write helper to avoid an unnecessary check on volatile bits.
> + *
> + * Set non-present value to SHADOW_NONPRESENT_VALUE, rather than 0.
> + * It is because when TDX is enabled, TDX module always
> + * enables "EPT-violation #VE", so KVM needs to set
> + * "suppress #VE" bit in EPT table entries, in order to get
> + * real EPT violation, rather than TDVMCALL. KVM sets
> + * SHADOW_NONPRESENT_VALUE (which sets "suppress #VE" bit) so it
> + * can be set when EPT table entries are zapped.
> */
> __kvm_tdp_mmu_write_spte(iter->sptep, SHADOW_NONPRESENT_VALUE);
>

I don't quite think this place is the good place to explain "suppress #VE" bit
staff. There are other places that sets non-present SPTE too. Perhaps putting
the comment around the macro 'SHADOW_NONPRESENT_VALUE' is better.