Re: [PATCH v2] KVM: VMX: Enable Notify VM exit

From: Xiaoyao Li
Date: Sun Sep 12 2021 - 22:58:33 EST


On 9/10/2021 2:59 AM, Sean Christopherson wrote:
On Tue, Sep 07, 2021, Xiaoyao Li wrote:
On 9/3/2021 12:36 AM, Sean Christopherson wrote:
On Thu, Sep 02, 2021, Sean Christopherson wrote:
On Tue, Aug 03, 2021, Xiaoyao Li wrote:
On 8/2/2021 11:46 PM, Sean Christopherson wrote:
@@ -5642,6 +5653,31 @@ static int handle_bus_lock_vmexit(struct kvm_vcpu *vcpu)
return 0;
}
+static int handle_notify(struct kvm_vcpu *vcpu)
+{
+ unsigned long exit_qual = vmx_get_exit_qual(vcpu);
+
+ if (!(exit_qual & NOTIFY_VM_CONTEXT_INVALID)) {

What does CONTEXT_INVALID mean? The ISE doesn't provide any information whatsoever.

It means whether the VM context is corrupted and not valid in the VMCS.

Well that's a bit terrifying. Under what conditions can the VM context become
corrupted? E.g. if the context can be corrupted by an inopportune NOTIFY exit,
then KVM needs to be ultra conservative as a false positive could be fatal to a
guest.


Short answer is no case will set the VM_CONTEXT_INVALID bit.

But something must set it, otherwise it wouldn't exist.

For existing Intel silicon, no case will set it. Maybe in the future new
case will set it.

The condition(s) under
which it can be set matters because it affects how KVM should respond. E.g. if
the guest can trigger VM_CONTEXT_INVALID at will, then we should probably treat
it as a shutdown and reset the VMCS.

Oh, and "shutdown" would be relative to the VMCS, i.e. if L2 triggers a NOTIFY
exit with VM_CONTEXT_INVALID then KVM shouldn't kill the entire VM. The least
awful option would probably be to synthesize a shutdown VM-Exit to L1. That
won't communicate to L1 that vmcs12 state is stale/bogus, but I don't see any way
to handle that via an existing VM-Exit reason :-/

But if VM_CONTEXT_INVALID can occur if and only if there's a hardware/ucode
issue, then we can do:

if (KVM_BUG_ON(exit_qual & NOTIFY_VM_CONTEXT_INVALID, vcpu->kvm))
return -EIO;

Either way, to enable this by default we need some form of documentation that
describes what conditions lead to VM_CONTEXT_INVALID.

I still don't know why the conditions lead to it matters. I think the
consensus is that once VM_CONTEXT_INVALID happens, the vcpu can no longer
run.

Yes, and no longer being able to run the vCPU is precisely the problem. The
condition(s) matters because if there's a possibility, however small, that enabling
NOTIFY_WINDOW can kill a well-behaved guest then it absolutely cannot be enabled by
default.

For now, no condition will set it. For future, I believe it will be set only for some fatal case. However, we cannot guarantee no silicon bug to break a well-behaved the guest. Maybe let's make it opt-in?

Either KVM_BUG_ON() or a specific EXIT to userspace should be OK?

Not if the VM_CONTEXT_INVALID happens while L2 is running. If software can trigger
VM_CONTEXT_INVALID at will, then killing the VM would open up the door to a
malicious L2 killing L1 (which would be rather ironic since this is an anti-DoS
feature). IIUC, VM_CONTEXT_INVALID only means the current VMCS is garbage, thus
an occurence while L2 is active means that vmcs02 is junk, but L1's state in vmcs01,
vmcs12, etc... is still valid.


Maybe we can kill the L2 when VM_CONTEXT_INVALID happens in L2.