Re: [PATCH] KVM: x86/mmu: Do not create SPTEs for GFNs that exceed host.MAXPHYADDR

From: Sean Christopherson
Date: Fri Apr 29 2022 - 10:24:45 EST


On Fri, Apr 29, 2022, Paolo Bonzini wrote:
> On 4/29/22 01:34, Sean Christopherson wrote:
>
> > +static inline gfn_t kvm_mmu_max_gfn_host(void)
> > +{
> > + /*
> > + * Disallow SPTEs (via memslots or cached MMIO) whose gfn would exceed
> > + * host.MAXPHYADDR. Assuming KVM is running on bare metal, guest
> > + * accesses beyond host.MAXPHYADDR will hit a #PF(RSVD) and never hit
> > + * an EPT Violation/Misconfig / #NPF, and so KVM will never install a
> > + * SPTE for such addresses. That doesn't hold true if KVM is running
> > + * as a VM itself, e.g. if the MAXPHYADDR KVM sees is less than
> > + * hardware's real MAXPHYADDR, but since KVM can't honor such behavior
> > + * on bare metal, disallow it entirely to simplify e.g. the TDP MMU.
> > + */
> > + return (1ULL << (shadow_phys_bits - PAGE_SHIFT)) - 1;
>
> The host.MAXPHYADDR however does not matter if EPT/NPT is not in use, because
> the shadow paging fault path can accept any gfn.

...

> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index e6cae6f22683..dba275d323a7 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -65,6 +65,30 @@ static __always_inline u64 rsvd_bits(int s, int e)
> return ((2ULL << (e - s)) - 1) << s;
> }
> +/*
> + * The number of non-reserved physical address bits irrespective of features
> + * that repurpose legal bits, e.g. MKTME.
> + */
> +extern u8 __read_mostly shadow_phys_bits;
> +
> +static inline gfn_t kvm_mmu_max_gfn(void)
> +{
> + /*
> + * Note that this uses the host MAXPHYADDR, not the guest's.
> + * EPT/NPT cannot support GPAs that would exceed host.MAXPHYADDR;
> + * assuming KVM is running on bare metal, guest accesses beyond
> + * host.MAXPHYADDR will hit a #PF(RSVD) and never cause a vmexit
> + * (either EPT Violation/Misconfig or #NPF), and so KVM will never
> + * install a SPTE for such addresses. If KVM is running as a VM
> + * itself, on the other hand, it might see a MAXPHYADDR that is less
> + * than hardware's real MAXPHYADDR. Using the host MAXPHYADDR
> + * disallows such SPTEs entirely and simplifies the TDP MMU.
> + */
> + int max_gpa_bits = likely(tdp_enabled) ? shadow_phys_bits : 52;

I don't love the divergent memslot behavior, but it's technically correct, so I
can't really argue. Do we want to "officially" document the memslot behavior?

> +
> + return (1ULL << (max_gpa_bits - PAGE_SHIFT)) - 1;
> +}
> +
> void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_mask);
> void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only);