Re: [PATCH v3 3/5] KVM: nVMX: fix checks on CR{0,4} during virtual VMX operation

From: Radim KrÄmÃÅ
Date: Thu Dec 01 2016 - 08:22:47 EST


2016-11-30 15:53-0800, David Matlack:
> On Wed, Nov 30, 2016 at 2:33 PM, Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote:
>> ----- Original Message -----
>>> From: "Radim KrÄmÃÅ" <rkrcmar@xxxxxxxxxx>
>>> To: "David Matlack" <dmatlack@xxxxxxxxxx>
>>> Cc: kvm@xxxxxxxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx, jmattson@xxxxxxxxxx, pbonzini@xxxxxxxxxx
>>> Sent: Wednesday, November 30, 2016 10:52:35 PM
>>> Subject: Re: [PATCH v3 3/5] KVM: nVMX: fix checks on CR{0,4} during virtual VMX operation
>>>
>>> 2016-11-29 18:14-0800, David Matlack:
>>> > KVM emulates MSR_IA32_VMX_CR{0,4}_FIXED1 with the value -1ULL, meaning
>>> > all CR0 and CR4 bits are allowed to be 1 during VMX operation.
>>> >
>>> > This does not match real hardware, which disallows the high 32 bits of
>>> > CR0 to be 1, and disallows reserved bits of CR4 to be 1 (including bits
>>> > which are defined in the SDM but missing according to CPUID). A guest
>>> > can induce a VM-entry failure by setting these bits in GUEST_CR0 and
>>> > GUEST_CR4, despite MSR_IA32_VMX_CR{0,4}_FIXED1 indicating they are
>>> > valid.
>>> >
>>> > Since KVM has allowed all bits to be 1 in CR0 and CR4, the existing
>>> > checks on these registers do not verify must-be-0 bits. Fix these checks
>>> > to identify must-be-0 bits according to MSR_IA32_VMX_CR{0,4}_FIXED1.
>>> >
>>> > This patch should introduce no change in behavior in KVM, since these
>>> > MSRs are still -1ULL.
>>> >
>>> > Signed-off-by: David Matlack <dmatlack@xxxxxxxxxx>
>>> > ---
>>> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>> > @@ -4104,6 +4110,40 @@ static void ept_save_pdptrs(struct kvm_vcpu *vcpu)
>>> > +static bool nested_guest_cr0_valid(struct kvm_vcpu *vcpu, unsigned long
>>> > val)
>>> > +{
>>> > + u64 fixed0 = to_vmx(vcpu)->nested.nested_vmx_cr0_fixed0;
>>> > + u64 fixed1 = to_vmx(vcpu)->nested.nested_vmx_cr0_fixed1;
>>> > + struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
>>> > +
>>> > + if (to_vmx(vcpu)->nested.nested_vmx_secondary_ctls_high &
>>> > + SECONDARY_EXEC_UNRESTRICTED_GUEST &&
>>> > + nested_cpu_has2(vmcs12, SECONDARY_EXEC_UNRESTRICTED_GUEST))
>>> > + fixed0 &= ~(X86_CR0_PE | X86_CR0_PG);
>>>
>>> These bits also seem to be guaranteed in fixed1 ... complicated
>>> dependencies.
>>
>> Bits that are set in fixed0 must be set in fixed1 too. Since patch 4
>> always sets CR0_FIXED1 to all-ones (matching bare metal), this is okay.
>>
>>> There is another exception, SDM 26.3.1.1 (Checks on Guest Control
>>> Registers, Debug Registers, and MSRs):
>>>
>>> Bit 29 (corresponding to CR0.NW) and bit 30 (CD) are never checked
>>> because the values of these bits are not changed by VM entry; see
>>> Section 26.3.2.1.
>>
>> Same here, we never check them anyway.

True.

>>> And another check:
>>>
>>> If bit 31 in the CR0 field (corresponding to PG) is 1, bit 0 in that
>>> field (PE) must also be 1.
>>
>> This should not be a problem, a failed vmentry is reflected into L1
>> anyway. We only need to check insofar as we could have a more restrictive
>> check than what the processor does.
>
> I had the same thought when I was first writing this patch, Radim.
> Maybe we should add a comment here. E.g.
>
> /*
> * CR0.PG && !CR0.PE is also invalid but caught by the CPU
> * during VM-entry to L2.
> */

Ah, no need, thanks. I expected that we want to kill L1 when VM entry
failure happens, because it could have been L0's bug too ... we also
pass failure while checking host state, which is definitely L0 bug and
shouldn't ever get to L1.

I'd like to assume that KVM is (going to be) sound, so both cases are
reasonable (just hindering development).