Re: [PATCH] KVM: x86: emulate wait-for-SIPI and SIPI-VMExit

From: Paolo Bonzini
Date: Thu Nov 05 2020 - 11:09:00 EST


On 22/09/20 07:23, yadong.qi@xxxxxxxxx wrote:
From: Yadong Qi <yadong.qi@xxxxxxxxx>

Background: We have a lightweight HV, it needs INIT-VMExit and
SIPI-VMExit to wake-up APs for guests since it do not monitor
the Local APIC. But currently virtual wait-for-SIPI(WFS) state
is not supported in nVMX, so when running on top of KVM, the L1
HV cannot receive the INIT-VMExit and SIPI-VMExit which cause
the L2 guest cannot wake up the APs.

According to Intel SDM Chapter 25.2 Other Causes of VM Exits,
SIPIs cause VM exits when a logical processor is in
wait-for-SIPI state.

In this patch:
1. introduce SIPI exit reason,
2. introduce wait-for-SIPI state for nVMX,
3. advertise wait-for-SIPI support to guest.

When L1 hypervisor is not monitoring Local APIC, L0 need to emulate
INIT-VMExit and SIPI-VMExit to L1 to emulate INIT-SIPI-SIPI for
L2. L2 LAPIC write would be traped by L0 Hypervisor(KVM), L0 should
emulate the INIT/SIPI vmexit to L1 hypervisor to set proper state
for L2's vcpu state.

There is a problem in this patch, in that this change is incorrect:


@@ -2847,7 +2847,8 @@ void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
*/
if (kvm_vcpu_latch_init(vcpu)) {
WARN_ON_ONCE(vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED);
- if (test_bit(KVM_APIC_SIPI, &apic->pending_events))
+ if (test_bit(KVM_APIC_SIPI, &apic->pending_events) &&
+ !is_guest_mode(vcpu))
clear_bit(KVM_APIC_SIPI, &apic->pending_events);
return;
}

Here you're not trying to process a latched INIT; you just want to delay the processing of the SIPI until check_nested_events.

The change does have a correct part in it. In particular, vmx_apic_init_signal_blocked should have been

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 47b8357b9751..64339121a4f0 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7558,7 +7558,7 @@ static void enable_smi_window(struct kvm_vcpu *vcpu)

static bool vmx_apic_init_signal_blocked(struct kvm_vcpu *vcpu)
{
- return to_vmx(vcpu)->nested.vmxon;
+ return to_vmx(vcpu)->nested.vmxon && !is_guest_mode(vcpu);
}

static void vmx_migrate_timers(struct kvm_vcpu *vcpu)

to only latch INIT signals in root mode. However, SIPI must be cleared unconditionally on SVM; the "!is_guest_mode" test in that case is incorrect.

The right way to do it is to call check_nested_events from kvm_apic_accept_events. This will cause an INIT or SIPI vmexit, as required. There is some extra complication to read pending_events *before* kvm_apic_accept_events and not steal from the guest any INIT or SIPI that is sent after kvm_apic_accept_events returns.

Thanks to your test case, I will test a patch and send it.

Paolo