Re: [PATCH v4 5/6] KVM: x86: Support interrupt dispatch in x2APIC mode with APIC-write VM exit

From: Sean Christopherson
Date: Fri Sep 10 2021 - 18:29:01 EST


On Mon, Aug 09, 2021, Zeng Guang wrote:
> Since IA x86 platform introduce features of IPI virtualization and
> User Interrupts, new behavior applies to the execution of WRMSR ICR

What do User Interrupts have to do with anything?

> register that causes APIC-write VM exit instead of MSR-write VM exit
> in x2APIC mode.

Please lead with what support is actually being added, and more directly state
what the new behavior actually is, e.g. when should KVM expect these types of
traps. The shortlog helps a bit, but APIC-write is somewhat ambiguous without
the context that it refers to the trap-like exits, not exception-like exits on
the WRMSR itself.

Peeking ahead, this probably should be squashed with the next patch that adds
IPI virtualizatio support. Without that patch's code that disables ICR MSR
intercepts for IPIv, this patch makes zero sense.

I'm not totally opposed to splitting IPIv support into two patches, I just don't
like splitting out this tiny subset that makes zero sense without the IPIv
code/context. I assume you took this approach so that the shortlog could be
"KVM: VMX:" for the IPIv code. IMO it's perfectly ok to keep that shortlog even
though there are minor changes outside of vmx/. VMX is the only user of
kvm_apic_write_nodecode(), so it's not wrong to say it affects only VMX.

> This requires KVM to emulate writing 64-bit value to offset 300H on
> the virtual-APIC page(VICR) for guest running in x2APIC mode when

Maybe stylize that as vICR to make it stand out as virtual ICR?

> APIC-wrtie VM exit occurs. Prevoisely KVM doesn't consider this
^^^^^ ^^^^^^^^^^
write Previously

> situation as CPU never produce APIC-write VM exit in x2APIC mode before.
>
> Signed-off-by: Zeng Guang <guang.zeng@xxxxxxxxx>
> ---
> arch/x86/kvm/lapic.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index ba5a27879f1d..0b0f0ce96679 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -2188,7 +2188,14 @@ void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset)
> /* hw has done the conditional check and inst decode */
> offset &= 0xff0;
>
> - kvm_lapic_reg_read(vcpu->arch.apic, offset, 4, &val);

Probably worth snapshotting vcpu->arch.apic.

> + if (apic_x2apic_mode(vcpu->arch.apic) && (offset == APIC_ICR)) {


A comment here would be _extremely_ helpful. IIUC, this path is reached when IPIv
is enabled for all ICR writes that can't be virtualized, e.g. broadcast IPIs.

And I'm tempted to say this should WARN and do nothing if KVM gets an exit on
anything except ICR writes.

> + u64 icr_val = *((u64 *)(vcpu->arch.apic->regs + offset));

Maybe just bump "val" to a u64?

Rather than open code this, can't this be:

kvm_lapic_reg_read(apic, offset, 8, &val);
> +
> + kvm_lapic_reg_write(vcpu->arch.apic, APIC_ICR2, (u32)(icr_val>>32));
> + val = (u32)icr_val;

Hmm, this is the third path that open codes the ICR2:ICR split. I think it's
probably worth adding a helper (patch below), and this can become:

void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset)
{
struct kvm_lapic *apic = vcpu->arch.apic;
u64 val = 0;

/* hw has done the conditional check and inst decode */
offset &= 0xff0;

/* TODO: optimize to just emulate side effect w/o one more write */
if (apic_x2apic_mode(apic)) {
if (WARN_ON_ONCE(offset != APIC_ICR))
return 1;

kvm_lapic_reg_read(apic, offset, 8, &val);
kvm_lapic_reg_write64(apic, offset, val);
} else {
kvm_lapic_reg_read(apic, offset, 4, &val);
kvm_lapic_reg_write(apic, offset, val);
}
}

There is some risk my idea will backfire if the CPU traps other WRMSRs, but even
then the pedant in me thinks the code for that should be:


if (apic_x2apic_mode(apic)) {
int size = offset == APIC_ICR ? 8 : 4;

kvm_lapic_reg_read(apic, offset, size, &val);
kvm_lapic_reg_write64(apic, offset, val);
} else {
...
}

or worst case scenario, move the APIC_ICR check back so that the non-ICR path
back to "if (apic_x2apic_mode(vcpu->arch.apic) && (offset == APIC_ICR))" so that
it naturally falls into the 4-byte read+write.

> + } else {
> + kvm_lapic_reg_read(vcpu->arch.apic, offset, 4, &val);
> + }
>
> /* TODO: optimize to just emulate side effect w/o one more write */
> kvm_lapic_reg_write(vcpu->arch.apic, offset, val);
> --
> 2.25.1