Re: [PATCH] svm: Fix AVIC incomplete IPI emulation

From: Suthikulpanit, Suravee
Date: Mon Mar 11 2019 - 07:38:15 EST


Hi Oren,

Sorry for delay response.

On 3/5/19 1:15 AM, Oren Twaig wrote:
> Hello Suravee,
>
> According to AMD's SDM, the target-not-running incomplete
> ipi exit is only received if any of the destination cpus had the
> not-running bit set in the avic backing page.

I believe you are referring to the "isRunning" (IR) bit is in the
AVIC physical APIC ID table entry.

> However, not before the CPU _already_ set the relevant IRR bit
> in all these cpus.

Not sure what you meant here.

> In this change, the patch forces KVM to send another interrupt
> to the vcpu whether SVM already did that or not. Which means
> the vcpu/s, under some conditions, can get an EXTRA interrupt
> it never intended to get >> Example:
> Â 1. vcpu B: Is in "not-running" state.
> Â 2. vcpu A: Writes to the ICR to send vector 80 to vcpu B
> Â 3. vcpu A: SVM updates vcpu B IRR with bit 80
> Â 4. vcpu A: SVM exits on incomplete IPI target-not-running exit.
> Â 5. vcpu A: Now stops executing any code @ hypervisor level.
> Â 6. vcpu B: Due to another interrupt (like lapic timer)
> ÂÂÂÂ resumes running the guest. While handling interrupts,
> ÂÂÂÂ it also handles interrupt vector 80 (as it's in his IRR)
> Â 7. vcpu A: resumes executing the below code and sends
> ÂÂÂÂ an _additional_interrupt to vcpu B.
>
> Overall, vcpu B got two interrupts. The second is unwanted and
> not documented in the system architecture.
>
> Can you please elaborate more to why the implementation
> below conflict with the specifications (which was the code
> before this commit) ?
This patch was introduced to fix an issue where the SVM driver tries to
handle the step 5 above by scheduling vcpu B into _running_ state to handle
the IPI from vcpu A. However, prior to this patch, vcpu B was never get
scheduled to run unless there are other interrupts (e.g. timer).
This should not be the case as Vcpu B should have been running regardless
of other interrupts. So, I don't think step 6 and 7 above are correct.

The issue was caused by the apic->irr_pending not set to true when trying to
get vcpu B scheduled. This flag is checked in apic_find_highest_irr() before
searching for the highest bit.

To fix the issue, I decided to leverage the existing emulation code for
ICR and ICR2, which in turn calls apic_send_ipi() to deliver interrupt to vpu B.

However, looking a bit more closely, I notice the logic in svm_deliver_avic_intr()
should also have been changed from kvm_vcpu_wake_up() to kvm_vcpu_kick()
since the latter will result in clearing the IRR bit for the IPI vector
when trying to send IPI as part of the following call path.

vcpu_enter_guest()
|-- inject_pending_event()
|-- kvm_cpu_get_interrupt()
|-- kvm_get_apic_interrupt()
|-- apic_clear_irr()
|-- apic_set_isr()
|-- apic_update_ppr() ....

Please see the patch below.

Not sure if this would address the problem you are seeing.

Thanks,
Suravee

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 24dfa6a93711..d2841c3dbc04 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -5219,11 +5256,13 @@ static void svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
kvm_lapic_set_irr(vec, vcpu->arch.apic);
smp_mb__after_atomic();

- if (avic_vcpu_is_running(vcpu))
+ if (avic_vcpu_is_running(vcpu)) {
wrmsrl(SVM_AVIC_DOORBELL,
kvm_cpu_get_apicid(vcpu->cpu));
- else
- kvm_vcpu_wake_up(vcpu);
+ } else {
+ kvm_make_request(KVM_REQ_EVENT, vcpu);
+ kvm_vcpu_kick(vcpu);
+ }
}

static void svm_ir_list_del(struct vcpu_svm *svm, struct amd_iommu_pi_data *pi)