Re: [RFC PATCH v5 075/104] KVM: x86: Check for pending APICv interrupt in kvm_vcpu_has_events()

From: Sean Christopherson
Date: Fri Apr 08 2022 - 12:24:35 EST


On Fri, Mar 04, 2022, isaku.yamahata@xxxxxxxxx wrote:
> From: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
>
> Return true for kvm_vcpu_has_events() if the vCPU has a pending APICv
> interrupt to support TDX's usage of APICv. Unlike VMX, TDX doesn't have
> access to vmcs.GUEST_INTR_STATUS and so can't emulate posted interrupts,

Based on the discussion in the HLT patch, this is no longer true.

> i.e. needs to generate a posted interrupt and more importantly can't
> manually move requested interrupts into the vIRR (which it also doesn't
> have access to).
>
> Because pi_has_pending_interrupt() is heavy operation which uses two atomic
> test bit operations and one atomic 256 bit bitmap check, introduce new
> callback for this check instead of reusing dy_apicv_has_pending_interrupt()
> callback to avoid affecting the exiting code.

...

> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 89d04cd64cd0..314ae43e07bf 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12111,7 +12111,10 @@ static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
>
> if (kvm_arch_interrupt_allowed(vcpu) &&
> (kvm_cpu_has_interrupt(vcpu) ||
> - kvm_guest_apic_has_interrupt(vcpu)))
> + kvm_guest_apic_has_interrupt(vcpu) ||
> + (vcpu->arch.apicv_active &&
> + kvm_x86_ops.apicv_has_pending_interrupt &&
> + kvm_x86_ops.apicv_has_pending_interrupt(vcpu))))

This is pretty gross (fully realizing that I wrote this patch). It's also arguably
wrong as it really should be called from apic_has_interrupt_for_ppr().

1. The hook implies it is valid for APICv in general, which is misleading.

2. It's wasted effort for VMX.

3. It does a poor job of conveying _why_ TDX is different.

4. KVM is unnecessarily processing its useless "copy" of the PPR/IRR for TDX
vCPUs. It's functionally not an issue unless userspace stuffs garbage into
KVM's vAPIC, but it's unnecessary work.

Rather than hook this path, I would rather we tag kvm_apic has having some of its
state protected. Then kvm_cpu_has_interrupt() can invoke the alternative,
protected-apic-only hook when appropriate, and kvm_apic_has_interrupt() can bail
immediately instead of doing useless processing of stale vAPIC state.

Note, the below moves the !apic check from tdx_vcpu_reset() to tdx_vcpu_create().
That part should be hoisted earlier in the series, there's no reason to wait until
RESET to perform the check, and I suspect the WARN_ON() can be triggered by userespace.

Compile tested only...

From: Sean Christopherson <seanjc@xxxxxxxxxx>
Date: Fri, 8 Apr 2022 08:56:27 -0700
Subject: [PATCH] KVM: TDX: Add support for find pending IRQ in a protected
local APIC

Add flag and hook to KVM's local APIC management to support determining
whether or not a TDX guest as a pending IRQ. For TDX vCPUs, the virtual
APIC page is owned by the TDX module and cannot be accessed by KVM. As a
result, registers that are virtualized by the CPU, e.g. PPR, cannot be
read or written by KVM. To deliver interrupts for TDX guests, KVM must
send an IRQ to the CPU on the posted interrupt notification vector. And
to determine if TDX vCPU has a pending interrupt, KVM must check if there
is an outstanding notification.

Return "no interrupt" in kvm_apic_has_interrupt() if the guest APIC is
protected to short-circuit the various other flows that try to pull an
IRQ out of the vAPIC, the only valid operation is querying _if_ an IRQ is
pending, KVM can't do anything based on _which_ IRQ is pending.

Intentionally omit sanity checks from other flows, e.g. PPR update, so as
not to degrade non-TDX guests with unecessary checks. A well-behaved KVM
and userspace will never reach those flows for TDX guests, but reaching
them is not fatal if something does go awry.

Note, this doesn't handle interrupts that have been delivered to the vCPU
but not yet recognized by the core, i.e. interrupts that are sitting in
vmcs.GUEST_INTR_STATUS. Querying that state requires a SEAMCALL and will
be supported in a future patch.

Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
---
arch/x86/include/asm/kvm-x86-ops.h | 1 +
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/irq.c | 3 +++
arch/x86/kvm/lapic.c | 3 +++
arch/x86/kvm/lapic.h | 2 ++
arch/x86/kvm/vmx/main.c | 11 +++++++++++
arch/x86/kvm/vmx/tdx.c | 9 ++++++---
7 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 7e27b73d839f..ce705d0c6241 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -110,6 +110,7 @@ KVM_X86_OP_NULL(update_pi_irte)
KVM_X86_OP_NULL(start_assignment)
KVM_X86_OP_NULL(apicv_post_state_restore)
KVM_X86_OP_NULL(dy_apicv_has_pending_interrupt)
+KVM_X86_OP_NULL(protected_apic_has_interrupt)
KVM_X86_OP_NULL(set_hv_timer)
KVM_X86_OP_NULL(cancel_hv_timer)
KVM_X86_OP(setup_mce)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 489374a57b66..b3dcc0814461 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1491,6 +1491,7 @@ struct kvm_x86_ops {
void (*start_assignment)(struct kvm *kvm);
void (*apicv_post_state_restore)(struct kvm_vcpu *vcpu);
bool (*dy_apicv_has_pending_interrupt)(struct kvm_vcpu *vcpu);
+ bool (*protected_apic_has_interrupt)(struct kvm_vcpu *vcpu);

int (*set_hv_timer)(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc,
bool *expired);
diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
index 172b05343cfd..24f180c538b0 100644
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -96,6 +96,9 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *v)
if (kvm_cpu_has_extint(v))
return 1;

+ if (lapic_in_kernel(v) && v->arch.apic->guest_apic_protected)
+ return static_call(kvm_x86_protected_apic_has_interrupt)(v);
+
return kvm_apic_has_interrupt(v) != -1; /* LAPIC */
}
EXPORT_SYMBOL_GPL(kvm_cpu_has_interrupt);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 9322e6340a74..50a483abc0fe 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2503,6 +2503,9 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu)
if (!kvm_apic_present(vcpu))
return -1;

+ if (apic->guest_apic_protected)
+ return -1;
+
__apic_update_ppr(apic, &ppr);
return apic_has_interrupt_for_ppr(apic, ppr);
}
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 2b44e533fc8d..7b62f1889a98 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -52,6 +52,8 @@ struct kvm_lapic {
bool sw_enabled;
bool irr_pending;
bool lvt0_in_nmi_mode;
+ /* Select registers in the vAPIC cannot be read/written. */
+ bool guest_apic_protected;
/* Number of bits set in ISR. */
s16 isr_count;
/* The highest vector set in ISR; if -1 - invalid, must scan ISR. */
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 882358ac270b..31aab8add010 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -42,6 +42,9 @@ static __init int vt_hardware_setup(void)

tdx_hardware_setup(&vt_x86_ops);

+ if (!enable_tdx)
+ vt_x86_ops.protected_apic_has_interrupt = NULL;
+
if (enable_ept) {
const u64 init_value = enable_tdx ? VMX_EPT_SUPPRESS_VE_BIT : 0ull;
kvm_mmu_set_ept_masks(enable_ept_ad_bits,
@@ -148,6 +151,13 @@ static void vt_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
return vmx_vcpu_load(vcpu, cpu);
}

+static bool vt_protected_apic_has_interrupt(struct kvm_vcpu *vcpu)
+{
+ KVM_BUG_ON(!is_td_vcpu(vcpu), vcpu->kvm);
+
+ return pi_has_pending_interrupt(vcpu);
+}
+
static void vt_flush_tlb_all(struct kvm_vcpu *vcpu)
{
if (is_td_vcpu(vcpu))
@@ -297,6 +307,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
.sync_pir_to_irr = vmx_sync_pir_to_irr,
.deliver_interrupt = vmx_deliver_interrupt,
.dy_apicv_has_pending_interrupt = pi_has_pending_interrupt,
+ .protected_apic_has_interrupt = vt_protected_apic_has_interrupt,

.set_tss_addr = vmx_set_tss_addr,
.set_identity_map_addr = vmx_set_identity_map_addr,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 3a0e826fbe0c..7b9370384ce4 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -467,6 +467,12 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu)
struct vcpu_tdx *tdx = to_tdx(vcpu);
int ret, i;

+ /* TDX only supports x2APIC, which requires an in-kernel local APIC. */
+ if (!vcpu->arch.apic)
+ return -EINVAL;
+
+ vcpu->arch.apic->guest_apic_protected = true;
+
ret = tdx_alloc_td_page(&tdx->tdvpr);
if (ret)
return ret;
@@ -602,9 +608,6 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
/* TDX doesn't support INIT event. */
if (WARN_ON(init_event))
goto td_bugged;
- /* TDX supports only X2APIC enabled. */
- if (WARN_ON(!vcpu->arch.apic))
- goto td_bugged;
if (WARN_ON(is_td_vcpu_created(tdx)))
goto td_bugged;


base-commit: f88e9fa63cbd87cda9352ee9a86a6f815744be33
--